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Abstract 

The multiple description (MD) problem has received considerable attention as a model of information transmission 
over unreliable channels. A general framework for designing efficient multiple description quantization schemes is 
proposed in this paper. We provide a systematic treatment of the El Gamal-Cover (EGC) achievable MD rate-distortion 
region, and show that any point in the EGC region can be achieved via a successive quantization scheme along with 
quantization splitting. For the quadratic Gaussian case, the proposed scheme has an intrinsic connection with the 
Gram-Schmidt orthogonalization, which implies that the whole Gaussian MD rate-distortion region is achievable with 
a sequential dithered lattice-based quantization scheme as the dimension of the (optimal) lattice quantizers becomes 
large. Moreover, this scheme is shown to be universal for all i.i.d. smooth sources with performance no worse than 
that for an i.i.d. Gaussian source with the same variance and asymptotically optimal at high resolution. A class 
of low-complexity MD scalar quantizers in the proposed general framework also is constructed and is illustrated 
geometrically; the performance is analyzed in the high resolution regime, which exhibits a noticeable improvement 
over the existing MD scalar quantization schemes. 

Index Terms 

Gram-Schmidt orthogonalization, lattice quantization, MMSE, multiple description, quantization splitting. 

I. Introduction 

In the multiple description problem the total available bit rate is split between two channels and either channel 
may be subject to failure. It is desired to allocate rate and coded representations between the two channels, such 
that if one channel fails, an adequate reconstruction of the source is possible, but if both channels are available, an 
improved reconstruction over the single-channel reception results. The formal definition of the MD problem is as 
follows (also see Fig. 1). 

Let {X(t)}^ 1 be an i.i.d. random process with X(t) ~ p{x) for all t. Let d(-, •) : X x X — ► [0,c? max ] be a 
distortion measure. 
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Definition 1.1: The quintuple (i?i, R 2 , D\, D 2 , D 3 ) is called achievable if for all e > 0, there exist, for n 
sufficiently large, encoding functions: 

f W :X n^ C W log | C («)| < n{Ri + £) i = l >2 , 

and decoding functions: 

. c („) ^ ^„ . = 12 

<7 3 n) : C[ n) x C<" } - AT" 

such that for X, - <,<">(/<"> (X)), i = 1,2, and for X 3 = g^\f[ n \X), / 2 (n) (X)), 

1 " 

-Eyd(X(t),Xi(t)) < Di + e i = l,2,3. 

The MD rate-distortion region, denoted by Q, is the set of all achievable quintuples. 

In this paper the encoding functions and are referred to as encoder 1 and encoder 2, respectively. 
Similarly, decoding functions g[ n \ and g^ are referred to as decoder 1, decoder 2, and decoder 3, respectively. 
It should be emphasized that in a real system, encoders 1 and 2 are just two different encoding functions of a single 
encoder while decoders 1, 2 and 3 are different decoding functions of a single decoder. Alternatively, in the MD 
literature decoders 1 and 2 are sometimes referred to as the side decoders because of their positions in Fig. 1, while 
decoder 3 is referred to as the central decoder. 




Fig. 1. Encoder and decoder diagram for multiple descriptions. 



Early contributions to the MD problem can be found in [l]-[4]. The first general result was El Gamal and Cover's 
achievable region. 

Definition 1.2 (EGC region): For random variables U\,U 2 and U 3 jointly distributed with the generic source 
variable X via conditional distribution p(u\, u 2 , u 3 \x), let 

H(U U U 2 , U 3 ) = {(R U R 2 ) :Ri + R 2 > I(X; U u U 2 , U 3 ) + /(E/i; U 2 ), R t > I(X; U&i -1,2}. 

Let 

Q(U 1 ,U 2 ,U 3 ) = {(R U R 2 ,D U D 2 ,D 3 ) : (R u R 2 ) e H(Ui, U 2 , U 3 ), 3Xi = 9i {Ui) with Ed(X,^) < D h i = 1,2,3}. 
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The EGC region 1 is then defined as 

Qegc = conv (J Q(Ui, U 2 , U 3 ) , 

\p(ui,u 2 ,u 3 \x) ) 

where conv(6>) denotes the convex hull of S for any set S in the Euclidean space. 

It was proved in [5] that Qegc C Q. Ozarow [3] showed that Qegc = Q for the quadratic Gaussian source. 
Ahlswede [6] showed that the EGC region is also tight for the "no excess sum-rate" case. Zhang and Berger [7] 
constructed a counterexample for which Qegc % Q- Further results can be found in [8]— [14]. The MD problem 
has also been generalized to the n-channel case [15], [16], but even the quadratic Gaussian case is far from being 
completely understood. The extension of the MD problem to the distributed source coding scenario has been 
considered in [17], [18], where the problem is again widely open. 

The first constructive method to generate multiple descriptions is the multiple description scalar quantization 
(MDSQ), which was proposed by Vaishampayan [19], [20]. The key component of this method is the index 
assignment, which maps an index to an index pair as the two descriptions. However, the design of the index 
assignment turns out to be a difficult problem. Since optimal solution cannot be found efficiently, Vaishampayan 
[19] provided several heuristic methods to construct balanced index assignments which are not optimal but likely 
to perform well. The analysis of this class of balanced quantizers reveals that asymptotically (in rate) it is 3.07 dB 
away from the rate-distortion bound [21] in terms of central and side distortion product, when a uniform central 
quantizer is used; this granular distortion gap can be reduced by 0.4 dB when the central quantizer cells are better 
optimized [22]. The design of balanced index assignment was recently more thoroughly addressed in [23] from an 
algorithm perspective, and the index assignments for more than two description appeared in [24]. Other methods 
have also been proposed to optimize the index assignments [25], [26]. 

The framework of MDSQ was later extended to multiple description lattice vector quantization (MDLVQ) for 
balanced descriptions in [27] and for the asymmetric case in [28]. The design relies heavily on the choice of 
lattice/sublattice structure to facilitate the construction of index assignments. The analysis on these quantizers shows 
that the constructions are high-resolution optimal in asymptotically high dimensions; however, in lower dimension, 
optimization of the code-cells can also improve the high-resolution performance [29] [30]. The major difficulty in 
constructing both MDSQ and MDLVQ is to find good index assignments, and thus it would simplify the overall 
design significantly if the component of index assignment can be eliminated altogether. 

Frank-Dayan and Zamir [31] proposed a class of MD schemes which use entropy-coded dithered lattice quantizers 
(ECDQs). The system consists of two independently dithered lattice quantizers as the two side quantizers, with a 
possible third dithered lattice quantizer to provide refinement information for the central decoder. It was found that 
even with the quadratic Gaussian source, this system is only optimal in asymptotically high dimensions for the 

'The form of the EGC region here is slightly different from the one given in [5], but it is straightforward to show they are equivalent. gz{Uz) 
can be also replaced by a function of (Ui, U2, U3), say g(Ui, U2, U3), but the resulting Qegc ' s still the same because for any (Ui, U2, U3) 
jointly distributed with X, there exist (Ui, U 2 , U 3 ) with U 3 = (Ui, U 2 , U 3 ) such that g 3 (Ui,U 2 , U 3 ) = g 3 (U 3 ). 
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degenerate cases such as successive refinement and the "no excess marginal-rate" case, but not optimal in general. 
The difficulty lies in generating dependent quantization errors of two side quantizers to simulate the Gaussian 
multiple description test channel. Several possible improvements were provided in [31], but the problem remains 
unsolved. 

The method of MD coding using correlating transforms was first proposed by Orchard, Wang, Vaishampayan, and 
Reibman [32], [33], and this technique has then been further developed in [34] and [35]. However, the transform- 
based approach is mainly designed for vector sources, and it is most suitable when the redundancy between the 
descriptions is kept relatively low. 

In this paper we provide a systematic treatment of the El Gamal-Cover (EGC) achievable MD rate-distortion 
region and show it can be decomposed into a simplified-EGC (SEGC) region and an superimposed refinement 
operation. Furthermore, any point in the SEGC region can be achieved via a successive quantization scheme along 
with quantization splitting. For the quadratic Gaussian case, the MD rate-distortion region is the same as the SEGC 
region, and the proposed scheme has an intrinsic connection with the Gram-Schmidt orthogonalization method. 
Thus we use single-description ECDQs, with independent subtractive dithers as building blocks for this MD coding 
scheme, by which the difficulty of generating dependent quantization errors is circumvented. Analytical expressions 
for the rate-distortion performance of this system are then derived for general sources, and compared to the optimal 
rate regions at both high and low lattice dimensions. 

The proposed scheme is conceptually different from those in [31], and it can achieve the whole Gaussian MD 
rate-distortion region as the dimension of the (optimal) lattice quantizers becomes large, unlike the method proposed 
in [31]. From a construction perspective, the new MD coding system can be realized by 2-3 conventional lattice 
quantizers along with some linear operations, and thus it is considerably simpler than MDSQ and MDLVQ by 
removing the index assignment and the reliance on the lattice/sublattice structure. Though the proposed coding 
scheme suggests many possible implementations of practical quantization methods, the focus of this article is on the 
information theoretic framework; thus instead of providing detailed designs of quantizers, a geometric interpretation 
of the scalar MD quantization scheme is given as an illustration to connect the information theoretic description of 
coding scheme and its practical counterpart. 

The remainder of this paper is divided into 6 sections. In Section II, ECDQ and the Gram-Schmidt orthog- 
onalization method are breifly reviewed and a connection between the successive quantization scheme and the 
Gram-Schmidt orthogonalization method is established. In Section III we present a systematic treatment of the 
EGC region and show the sufficiency of a successive quantization scheme along with quantization splitting. In 
Section IV the quadratic Gaussian case is considered in more depth. In Section V the proposed scheme based on 
ECDQ is shown to be universal for all i.i.d. smooth sources with performance no worse than that for an i.i.d. 
Gaussian source with the same variance and asymptotically optimal at high resolution. A geometric interpretation 
of the scalar MD quantization scheme in our framework is given in Section VI. Some further extensions are 
suggested in Section VII, which also serves as the conclusion. Throughout, we use boldfaced letters to indicate 
(n-dimensional) vectors, capital letters for random objects, and small letters for their realizations. For example, we 
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let X = (X(l), ■■■ , X(n)) T and x = (x(l), • ■ • , x{n)) T . 

II. Entropy-Coded Dithered Quantization and Gram-Schmidt Orthogonalization 

In this section, we first give a brief review of ECDQ, and then explain the difficulty of applying ECDQ directly 
to the MD problem. As a method to resolve this difficulty, the Gram-Schmidt orthogonalization is introduced and a 
connection between the sequential (dithered) quantization and the Gram-Schmidt orthogonalization is established. 
The purpose of this section is two-fold: The first is to review related results on ECDQ and the Gram-Schmidt 
orthogonalization and show their connection, while the second is to explicate the intuition that motivated this work. 

A. Review of Entropy-Coded Dithered Quantization 

Some basic definitions and properties of ECDQ from [31] are quoted below. More detailed discussion and 
derivation can be found in [36]-[39]. 

An n-dimensional lattice quantizer is formed from a lattice L„. The quantizer Q n ( ) maps each vector x € 1Z' 1 
into the lattice point 1; G L„ that is nearest to x. The region of all n-vectors mapped into a lattice point 1; S L„ 
is the Voronoi region 

V(\ l ) = {*eK n : Hx-1,11 < Hx-lj-H.VjVi}. 

The dither Z is an n-dimensional random vector, independent of the source, and uniformly distributed over the 
basic cell Vq of the lattice which is the Voronoi region of the lattice point 0. The dither vector is assumed to be 
available to both the encoder and the decoder. The normalized second moment G„ of the lattice characterizes the 
second moment of the dither vector 

-E||Z|| 2 = G n V 2/n , 
n 

where V denotes the volume of Vq. Both the entropy encoder and the decoder are conditioned on the dither sample 
Z; furthermore, the entropy coder is assumed to be ideal. The lattice quantizer with dither represents the source 
vector X by the vector W = <5„(X + Z) — Z. The resulting properties of the ECDQ are as follows. 

1) The quantization error vector W — X is independent of X and is distributed as — Z. In particular, the 
mean-squared quantization error is given by the second moment of the dither, independently of the source 
distribution, i.e., 

-E||W - X|| 2 = -E||Z|| 2 = G n V 2/n . 
n n 

2) The coding rate of the ECDQ is equal to the mutual information between the input and output of an additive 
noise channel Y = X + N, where N, the channel's noise, has the same probability density function as — Z 
(see Fig. |3 , 

ff(Q n (X + Z)|Z) = 7(X;Y) =h(Y)-h(N). 
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3) For optimal lattice quantizers, i.e., lattice quantizers with the minimal normalized second moment G n , the 
autocorrelation of the quantizer noise is "white" , i.e., EZZ T = <r 2 I n where I n is the n x n identity matrix, 
a 2 = G° pt V 2 / n is the second moment of the lattice, and 

Ivo H X H 2dx 



G7 l 



mm 



On(-) nV 1+ n 

is the minimal normalized second moment of an n-dimensional lattice. 
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Fig. 2. ECDQ and its equivalent additive-noise channel. 

Consider the following problem to motivate the general result. Suppose a quantization system is needed with 
input X\ and outputs (X 2 , • • • , Im) such that the quantization errors Xf — X\, i = 2, • • • , M, are correlated with 
each other in a certain predetermined way, but are uncorrected with X\. Seemingly, M — 1 quantizers may be 
used, each with X\ as the input and Xi as the output for some i, i = 2, • • • , M. By property 1) of ECDQ, if 
dithers are introduced, the quantization errors are uncorrected (actually independent) of the input of the quantizer. 
However, it is difficult to make the quantization errors of these M — 1 quantizers correlated in the desired manner. 
One may expect it to be possible to correlate the quantization errors by simply correlating the dithers of different 
quantizers, but this turns out to be not true as pointed out in [31]. Next, we present a solution to this problem by 
exploiting the relationship between the Gram-Schmidt orthogonalization and sequential (dithered) quantization. 

B. Gram-Schmidt Orthogonalization 

In order to facilitate the treatment, the problem is reformulated in an equivalent form: Given X^ 1 with an arbitrary 
co variance matrix, construct a quantization system with X\ as the input and (X 2 , ■ ■ ■ ,Xm) as the outputs such 
that the covariance matrices of 1, M and X? 1 are the same. 
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Let Ti s denote the set of all finite-variance, zero-mean, real scalar random variables. It is well known [40], [41] 
that H s becomes a Hilbert space under the inner product mapping 

(X, Y) = E(XY) -.HsxHs^U. 

The norm induced by this inner product is 

||X|| 2 = (X,X) =EX 2 . 

For Xf 1 — (Xi, • • • , X M ) T with Xi e H s , i = 1, • • • , M, the Gram-Schmidt orthogonalization can be used to 
construct an orthogonal basis Bf 1 = (Bi, ■ ■ ■ ,Bm) t for X^ . Specifically, the Gram-Schmidt orthogonalization 
proceeds as follows: 

B\ = X\, 



d v \ " ( X i> B j) n 



1^1 



Aj — ^TT^ D 



i = 2, ■ ■ ■ , M. 



Note 



. E(Xj_Bj) 



EB 



t 12 can assume any real number if _Bj = 0. Alternatively, Bf 1 can also be computed using the 



method of linear estimation. Let K x ™ denote the covariance matrix of (Xi,-- - ,X m ) T and let K XmX ™-i = 

E[X m {X lr -- ,X m - 1 ) T ], then 

B\ = X u (1) 

Bi = X t - K^Xt 1 , i = 2, ■ ■ ■ , M. (2) 

Here G 7^ lx ( J ^ 1 ) i s a row vector satisfying Ki_\K x %-i = K XiX i-i. When K x i-i is invertible, A^-i is 

uniquely given by K x x i-iK~}_ 1 . The product Ki-\X\^ X is the linear MMSE estimate of given X\ _1 , and 

t i A x 

EBf is its corresponding //near MMSE estimation error. 

The Gram-Schmidt orthogonalization is closely related to the LDL T factorization. That is, if all leading minors 
of K x m are nonzero, then there exists a unique factorization such that K x m = LDL T , where D is diagonal, and 
L is lower triangular with unit diagonal. Specifically, D = diag {Hi^H 2 , • • • , ||i3 M || 2 } and 

/ 1 \ 



1 

(X 2 ,B 1 ) 

<X 3 ,Bi) (X 3 ,B 2 ) 

IISill 2 \\B 2 P 



1 



. {X L ,By) (X L ,B 2 ) (X L ,B 3 ) i 

V usiir iis 2 p ii s 3 '•• V 

= L~ 1 Xi' 1 is sometimes referred to as the innovation process [40]. 
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In the special case in which Xf 1 are jointly Gaussian, the elements of Bf 1 are given by 

Bi = X\, 

B, = Xi-EiXilXi' 1 ) 

i-l 

= Xi-J^HXilBj), i = 2,.--,M, 

3=1 

and Bi are zero-mean, independent and jointly Gaussian. Moreover, since X\ is a deterministic function of 
B\, it follows that S|f x is independent of X\, for i = 1, • • • , M - 1. Note: For i = 2, • • • , M, E(X i \Xl~ 1 ) (or 
Y, l 3 ~=i ^{Xi\Bj)) is a sufficient statistic 2 for estimation of X ( from X 1 ^ 1 (or S^ 1 ); E(X t |Jf j" 1 ) (or Y!f=x H x i\ B j)) 
also is the MMSE estimate of given X^ 1 (or and ES 2 is the MMSE estimation error. 

We now show that one can construct a sequential quantization system with X\ as the input to generate a zero- 
mean random vector X^ 1 — (Xi,X 2 , ■ • • , Xm) T whose covariance matrix is also K x m . Let Xf 4 be a zero-mean 
random vector with covariance matrix K x m. By Q and (0, it is true that 

Xx = Si, (3) 
Xi = K^Xl- 1 +B t , i = 2, • • • , M. (4) 

Assume that B L ^ for i = 2, • • • , M. Let Qi,i(-) be a scalar lattice quantizer with step size Aj = 12ES 2 +1 , 
i = 1, 2, • • • , M — 1. Let the dither Zj ~ W(— Aj/2, Aj/2) be a random variable uniformly distributed over the 
basic cell of Q^i, i = 1, 2, • • • , M — 1. Note: the second subscript n of Qi t7l denotes the dimension of the lattice 
quantizer. In this case n — 1, so it is a scalar quantizer. 
Suppose (X±, Z\, ■ ■ ■ , Zm-i) are independent. Define 

X\ = Xi, 

X = Qi-i.i (ifi-iXj" 1 + - Z t _ 1; i = 2, • • • , M. 

By property 2) of the ECDQ, we have 

Xx = Xt, (5) 
X = Ki-xX^+Ni, i = 2,---,M, (6) 

where N, ~ U{A l /2, A,/2) with E7V 2 = ES| +1 , i = 1, • • • ,M - 1, and (X x , iVi, • • • , iV M ) are independent. By 
comparing Q, @ and 0, ©, it is straightforward to verify that and Xf 1 have the same covariance matrix. 

Since ES 2 (i = 2, ■ ■ ■ , M) are not necessarily the same, it follows that the quantizers Qi.i(-) (i = 1, • • • , M — 1) 
are different in general. But by incorporating linear pre- and post-filters [38], all these quantizers can be made 
identical. Specifically, given a scalar lattice quantizer Qi(-) with step size A, let the dither Z- ~ U{— A/2, A/2) be 

2 Actually, EpQ|Xj _1 ) (or Ej=i E PQ] B i)) is a minimal sufficient statistic; i.e., E(X;|Xj _1 ) (or "E%i^( x i\ B j)) is a function of 
every other sufficient statistic (or f{B\' 1 )). 
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a random variable uniformly distributed over the basic cell of Q\, i = 1, 2, • • • , M—l. Suppose (X l7 Z[, • • • , Z' M _ 1 ) 
are independent. Define 



X\ — X\, 



Qi (—K^xl 1 + Z[\ - Z[_ x 



where = ± v/ — A ^' +1 , i = 1, 2, • • • , M — 1. By property 2) of the ECDQ, it is again straightforward to verify 
that Xf 4 and X*f have the same covariance matrix. Essentially by introducing the prefilter ^- and the postfilter 
cii, the quantizer Qi(-) is converted to the quantizer Qi^i(-) for which 

X 

Qi,i{x) = aiQi(-). 

This is referred to as the shaping [37] of the quantizer Qi(-) by ai. In the case where A 2 = 12, we have E(Z|) 2 = 1, 
i = 1, • • • , M — 1, and the constructed sequential (dithered) quantization system can be regarded as a simulation 
of Gram-Schmidt orthonormalization. 

If Bi = for some i, then Xi = Ki-iX^ 1 (or Xi = Ki-\X\ 1 ) and therefore no quantization operation is 
needed to generate Xi (or Xi ) from X^ 1 (or X\ 1 ). 

The generalization of the correspondence between the Gram-Schmidt orthogonalization and the sequential (dithered) 
quantization to the vector case is straightforward; see Appendix I. 

III. Successive Quantization and Quantization Splitting 

In this section, an information-theoretic analysis of the EGC region is provided. Two coding schemes, namely 
successive quantization and quantization splitting, are subsequently introduced. Together with Gram-Schmidt or- 
thogonalization, they are the main components of the quantization schemes that will be presented in the next two 
sections. 



A. An information theoretic analysis of the EGC region 
Rewrite 1Z(Ui,U2,U3) in the following form: 

H(U lt U2, U 3 ) = {(Ri,R 2 ) :Ri+R 2 > I(X; U u U 2 ) + I(U i; U 2 ) + I(X; U 3 \U U U 2 ),R t > I(X; U t ),i -1,2}. 

Without loss of generality, assume that X — > [7 3 — > (Ui,U 2 ) form a Markov chain since otherwise [7 3 can be 
replaced by [/ 3 = (Ui, U 2 , U3) without affecting the rate and distortion constraints. Therefore J7 3 can be viewed as 
a fine description of X and (Ui, U 2 ) as coarse descriptions of X. The term I(X, C/3 1 , U 2 ) is the rate used for 
the superimposed refinement from the pair of coarse descriptions (Ui, U 2 ) to the fine description [7 3 ; in general, 
this refinement rate is split between the two channels. Since description refinement schemes have been studied 
extensively in the multiresolution or layered source coding scenario and are well-understood, this operation can be 
separated from other parts of the EGC scheme. 
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Definition 3.1 (SEGC region): For random variables 11% and U 2 jointly distributed with the generic source 
variable X via conditional distribution p(ui,u 2 \x), let 



K(Ui,U 2 ) = {(Ri, R2) :Ri + R 2 > I(X; U x , U 2 ) + I{U V ,U 2 ), R t > I{X; 17,), % = 1, 2} . 



Let 



Q(*7i,C/ 2 ) - [{Ri,R2,D 1 ,D 2 ,D z ) : {R l ,R 2 ) G ft(l7i, C/ 2 ), 3*i = 9x[Ux),X 2 = g 2 (U 2 ),X 3 = 53 (?7i, ^2) 

with Ed[X, Xi) < A, i = 1, 2, 3} . 

The SEGC region is defined as 

Qsegc= conv |J Q(U U U 2 ) 

\p(ui ,u 2 \x) 

The SEGC region first appeared in [1] and was attributed to El Gamal and Cover. It was shown in [7] that 

QSEGC ^ QeGC- 

Using the identity 

I{A- BC) = I(A; B) + I(A; C) + I(B; C\A) - I(B; C), 
TZ(Ui,U 2 ) can be written as 

n{U 1 ,U 2 ) = {(R 1) R 2 ):R 1 + R 2 >I(X;U 1 )+I(X;U 2 )+I(U 1 -,U 2 \X),R i >I(X;U i ),i = 1,2}. 
The typical shape of TZ(Ui, U2) is shown in Fig. [3] 

i? 2 A 



Vt 



Ri 

Fig. 3 . The shape of K{ Ui , U 2 ) . 

It is noteworthy that TZ(U%, U2) resembles Marton's achievable region [42] for a two-user broadcast channel. This 
is not surprising since the proof of the EGC theorem relies heavily on the results in [43] which were originally for a 
simplified proof of Marton's coding theorem for the discrete memoryless broadcast channel. Since the corner points 
of Marton's region can be achieved via a relatively simple coding scheme due to Gel'fand and Pinsker [44], which 
for the Gaussian case becomes Costa's dirty paper coding [45], it is natural to conjecture that simple quantization 
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schemes may exist for the corner points of 1Z{U\, 1/2)- This conjecture turns out to be correct as will be shown 
below. 

Since I{U\; U2\X) > 0, the sum-rate constraint in TZ(U\, U2) is always effective. Thus 

{(Ri,R 2 ) :Ri+R2= I(X- Ui) + I(X; U 2 ) + I(Ui; U 2 \X),Ri > I(X; Ui), i = 1, 2} 

will be called the dominant face of TZ(U\, 1/2)- Any rate pair inside 1Z{JJ\, U 2 ) is inferior to some rate pair on the 
dominant face in terms of compression efficiency. Hence, in searching for the optimal scheme, attention can be 
restricted to rate pairs on the dominant face without loss of generality. The dominant face of TZ(U\, U2) has two 
vertices V\ and V2. Let (Ri(Vi), i?2(^)) denote the coordinates of vertex Vu i = 1,2, then 

Vi: Ri(Vi) = I(X; Ui), R 2 (V 1 ) = I(X, Ui; U 2 ); 

V 2 : Ri(V 2 ) = I(X, U 2 ; Ui), R 2 (V 2 ) = I(X; U 2 ). 
The expressions of these two vertices directly lead to the following successive quantization scheme. By symmetry, 
we shall only consider V\. 

B. Successive Quantization Scheme 

The successive quantization scheme is given as follows: 

1) Codebook Generation: Encoder 1 independently generates 2™[ / ' X;C/l ' )+ei l codewords {Ui^')}^^' l)+ 11 ac _ 
cording to the distribution H-P^i)- Encoder2 independently generates 2 n ^ I ( X ' Ul ' U2 ^ +£2 ^ codewords {XJ2(k)}'^. 1 
according to the distribution IIM^)- 

2) Encoding Procedure: Given X, encoder 1 finds the codeword Ui(j*) such that Ui(j*) is strongly typical 
with X. Then encoder 2 finds the codeword U2(fc*) such that U2(fc*) is strongly typical with X and U\(j*). 
Index j* is transmitted through channel 1 and index k* is transmitted through channel 2. 

3) Reconstruction: Decoder 1 reconstructs Xi with X\(t) = gi(U\(j* , t)). Decoder 2 reconstructs X2 with 
X 2 {t) = g 2 (U 2 (k*,t)). Decoder 3 reconstructs X 3 with X 3 (t) = g^U^j* ,t),U 2 {k* ,t)). Here, Ui(j*,t) 
and U2(k*,t) are the t-th entries of U\(j*) and U 2 (fc*), respectively, t = 1, 2, • • • ,n. 

It can be shown rigorously that 

n 

-y2Ed(Xi(t),Xi(t)) < Ed(X,g l (U l )) + e 2+t , i = 1,2, 

n 

-y)Ed(X 3 (t),X 3 (t)) < Ed(X,g 3 (U 1 ,U 2 )) + e 5 
t=i 

as n goes to infinity and ej (i = 1, 2, • • • , 5) can be made arbitrarily close to zero. The proof is conventional and 
thus is omitted. 

For this scheme, encoder 1 does the encoding first and then encoder 2 follows. The main complexity of this 
scheme resides in encoder 2, since it needs to construct a codebook that covers the (X, Ui)-space instead of just the 
X-space. Observe that, if a function f(X, U\) = V can be found such that V is a sufficient statistic for estimation 
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U 2 from (X, Ui), i.e., (X, U\) — > V — > J7 2 form a Markov chain 3 , then 

/(X,f/ i; [/ 2 ) = 7(l/;[/ 2 ). 

The importance of this observation is that encoder 2 then only needs to construct a codebook that covers the V- 
space instead of the (X, Ui)-space. This is because the Markov lemma [46] implies that if U 2 is jointly typical 
with V, then U 2 is jointly typical with (X, Ui) with high probability. This observation turns out to be crucial for 
the quadratic Gaussian case. 

We point out that the successive coding structure associated with the corner points of TZ(U\, U 2 ) is not a special 
case in network information theory. Besides its resemblance to the successive Gel'fand-Pinsker coding structure 
associated with the corner points of the Marton's region previously mentioned, other noteworthy examples include 
the successive decoding structure associated with the corner points of the Slepian-Wolf region [47] (and more 
generally, the Berger-Tung region [46], [48], [49]) and the corner points of the capacity region of the memoryless 
multiaccess channel [50], [51]. 

C. Successive Quantization Scheme with Quantization Splitting 

A straightforward method to achieve an arbitrary rate pair on the dominant face of 1Z(Ui, U 2 ) is timesharing of 
coding schemes that achieve the two vertices. However, such a scheme requires four quantizers in general. Instead, 
the scheme based on quantization splitting introduced below needs only three quantizers. Before presenting it, we 
shall first prove the following theorem. 

Theorem 3.1: For any rate pair (Ri,R 2 ) on the dominant face of TZ(Ui, U 2 ), there exists a random variable U' 2 
with (X, Ui) -> U 2 -> U 2 such that 

R 1 = 7(X,^;t/i), 
R 2 = /(X;^)+7(X,f/ i; ?7 2 |^). 
Similarly, there exists a random variable U[ with (X, U 2 ) — > U\ — > U[ such that 

R 1 = J(X,I7i)+/(A-,X 2 ;I7 1 |^), 

R 2 = I(X,U[;U 2 ). 
Before proceeding to prove this theorem, we make the following remarks. 

• By the symmetry between the two forms, only the statement regarding the first form needs to be proved. 

• Since (X, U\) — > U 2 — > U' 2 form a Markov chain, if U 2 is independent of U 2 , then it must be independent of 
(X, Ui, U 2 ) altogether 4 . Then in this case, 

R 1 =I(X;U 1 ), R 2 =I(X,U 1 ;U 2 ), 

3 Such a function /(■,■) always exists provided |V| > A* 1 1 | . 

4 This is because p(x, u\, u^u'^) = p(u2\u' 2 )p(x , «i|u2, u'z) = p( u 2)p(%, ui I "2) = p(x, u\,U2). 
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which are the coordinates of V\. 
• At the other extreme, letting U 2 be U 2 gives 

R x = I(X,U 2 ;U 1 ), R 2 =I(X,U 2 ), 

which are the coordinates of V 2 . 

Proof: First construct a class of transition probabilities 5 p e (u' 2 \u 2 ) indexed by e such that I(U;U 2 ) varies 
continuously from to H(U 2 ) as e changes from to 1, with (X, Ui) — > U 2 —* U 2 holding for all the members 
of this class. It remains to show that 

R 1 +R 2 = I(X, U x ) + I(X: U 2 ) + I(UuU 2 \X). 

This is indeed true since 

Ri+R 2 = IiX^U^+IiX-U'^+IiX^-U^) 

= I(X, U' 2 ; U{) + I(X; U' 2 ) + I(X; U 2 \U' 2 ) + I{Uv, U 2 \X, U' 2 ) 

= I(X,U 2 ,U^,U 1 )+I(X;U 2 ,U 2 \). 
By the construction (X, Ui) — > U 2 — > U 2 , it follows that 

iix^u'^u^ + nx-uM) 

= I{X,U 2 ;U X ) + I{X;U 2 ) 

= I{X,U X ) + I{X-U 2 ) + I(U i; U 2 \X), 

which completes the proof. ■ 
The successive quantization scheme with quantization splitting is given as follows: 

1) Codebook Generation: Encoder 1 independently generates 2™[ / ( X ' C/ 2; c/ i)+ e i] codewords {Ui(i)}f2i <X ' Ul)+€l] 
according to the marginal distribution [jp(«i). Encoder 2 independently generates 2 n ^ I( - X ' U ^ +e ' 2 ^ codewords 
{U'2G')}|=i <X ' U2>+ ' 2] according to the marginal distribution ]Jp(u 2 ). For each codeword U' 2 (j), encoder 2 
independently generates 2 n V ( - x > Ui;U2 \ u ^ +£ '^ codewords {U 2 (j, k)}^*'" 1 '" 21 " 2 ^ according to the condi- 
tional distribution Y\p(u 2 \U 2 (j,t)). Here U 2 (j,t) is the t-th entry of U' 2 (j) 

t 

2) Encoding Procedure: Given X, encoder 2 finds the codeword V 2 (j*) such that U' 2 (j*) is strongly typical 
with X. Then encoder 1 finds the codeword Ui(i*) such that Ui(z*) is strongly typical with X and U' 2 (j*). 
Finally, encoder 2 finds the codeword \J 2 (j*,k*) such that U 2 (j*,fc*) is strongly typical with X, Ui(i*) 
and U' 2 (j*). Index i* is transmitted through channel 1. Indices j* and k* are transmitted through channel 2. 

3) Reconstruction: Decoder 1 reconstructs Xi with Xi(t) = gi(Ui(i* , t)). Decoder 2 reconstructs X 2 with 
X 2 (t) = g 2 (U 2 (j*,k*,t)). Decoder 3 reconstructs X 3 with X 3 (t) = g 3 (Ui (i* ,t),U 2 (j* ,k* ,t)). Here Ui (i*, t) 
is the t-th entry of Ui(i*) and U 2 (j*,k*,t) is the t-th entry of U 2 (f , k*), t = 1,2, • • • ,n. 

5 There are many ways to construct such a class of transition probabilities. For example, we can let po(u' 2 \u2) = p(u' 2 ), pi(u' 2 \u2) = 
S(u2,u' 2 ), and set p c (u' 2 \u2) = (1 — e)po(u' 2 \u2) + ep\(u' 2 \u2) . Here 5(u2,u' 2 ) = 1 if U2 = u' 2 and = otherwise. 
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Again, it can be shown rigorously that 

n 

71 < " 



n 
t=i 



1 ™ 



n 
t=i 

as n goes to infinity and (i = 1, 2, ■ ■ ■ , 6) can be made arbitrarily close to zero. The proof is standard, so the 
details are omitted. 

This approach is a natural generalization of the successive quantization scheme for the vertices of 1Z(U\,U 2 ). 
U 2 can be viewed as a coarse description of X and XJ 2 as a fine description of X. The idea of introducing an 
auxiliary coarse description to convert a joint coding scheme to a successive coding scheme has been widely 
used in the distributed source coding problems [52]-[54]. Similar ideas have also found application in multiaccess 
communications [55]-[58]. 

IV. The Gaussian Multiple Description Region 

In this section we apply the general results in the preceding section to the quadratic Gaussian case 6 . The Gaussian 
MD rate-distortion region is first analyzed to show that Qegc = Qsegc m this case. Then, by incorporating 
the Gram-Schmidt orthogonalization with successive quantization and quantization splitting, a coding scheme that 
achieves the whole Gaussian MD region is presented. 

A. An Analysis of the Gaussian MD Region 

Let {X G (t)} ( ^ 1 be an i.i.d. Gaussian process with X G (t) <~ 7V(0, a\) for all t. Let d(-, •) be the squared error 

distortion measure. For the quadratic Gaussian case, the MD rate-distortion region was characterized in [3], [5], 

[60]. Namely, {R U R 2 ,D U D 2 ,D 3 ) e Q if and only if 

1 a\ 
Ri > ^if' i = 1 ' 2 > 

1 a 2 1 
R1 + R2 > -l0g-^ + -l0gV>(£>l,£>2,£>3), 



where 



V>(Di,D 2 ,D 3 ) 



1, D 3 < D 1 +D 2 -a 2 x 

"•x-Ps n ^ f 1 . 1 1 r 1 

_ (* 2 x ~D 3 )i-l y /(*j c -D 1 )( cr l-D 2 )-^(D 1 -D 3 )(D 2 -D 3 )]^ 



The case D3 < D\ + D 2 — o\ and the case D3 > (l/-Di + I/-D2 — ^/ a x) 1 ^ degenerate. It is easy to 
verify that for any {R 1 ,R 2 ,D 1 ,D 2 , D 3 ) e Q with D 3 <Di + D 2 - a x , there exist D\ < D u D 2 < D 2 such that 

6 A11 our results derived under the assumption of discrete memoryless source and bounded distortion measure can be generalized to the 
quadratic Gaussian case, using the technique in [59]. 
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(Ri,R 2 ,Dl,D%,D 3 ) e Q and A = D{ + D* - a\. Similarly, for any (R u R 2 , D u D 2 , D 3 ) e Q with D 3 > 
(l/A + 1/D 2 - 1/<t|) _1 , there exist A = (l/A + l/A - l/cr|) _1 < A such that (A, #2, A, A, A3 ) e 
Q. Henceforth we shall only consider the subregion when (l/A + l/A — 1/fx) 1 > A > A + D 2 — a%, for 
which Di,D 2 and D 3 all are effective. 
Following the approach in [5], let 

A = X G + T + T 1 , (7) 
U 2 = X G + T + T 2 , (8) 

where (Ti,T 2 ), T , X are zero-mean, jointly Gaussian and independent, and E(TiT 2 ) = —a Tl (TT 2 - Let Xp = 
E(X G | A) = eaUi (i = 1, 2), and Xf = E(X G | A, A) = /3i A + #2 A, where 

a\ 

Pi - ^ 



(ct Ti +ctt 2 )(^ + <4 )' 



P2 = 



{a Tl +(TT 2 )((T 2 x +a^ o )' 
Set Epf G - Xp) 2 =D it i = 1,2,3; then 



D 3<4 



^ = ^r^' (9) 

2 A<4 A<4 . , 9 nm 



- A cr^ - As 
With these cr 2 . (i = 0, 1, 2), it is straightforward to verify that 

KX G :U t ) = -log X 2 T 2 ~ 
z a To -t- fJ T . 

1 4 

= -log— i = 1,2, 

2 8 A 



AX G ;A) + A* G ;A) + AA;A|X G ) = x log 2 T ° 2 + o lo S 



1 ! 1n ( <t to + 4 1 )(4q + 4 2 ) 

2 g 4 o (<7 Tl +a T2 )2 

Jlog^f + JlogV(A,A,A). 



Therefore, we have 



K G (U U U 2 ) 4 {(A, A) : A + A > 7(X G ; A) + /(X G ; A) + /(A; U 2 \X G ), A > I(X G ; A),« = 1, 2} 

= |(A,i?2):A+i?2>^logg + ilogV(A,A,A),A>^log^-,i = l,2j. (11) 

Hence for the quadratic Gaussian case, 

2 = Qegc = Qsegc 
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and there is no need to introduce {/ 3 (more precisely, U3 can be represented as a deterministic function of U\ and 
U 2 ). 

The coordinates of the vertices V G and V 2 G of TZ G {U\ 1 U2) can be computed as follows. 



1 a\ + at + cr 2 



*<yn - 2 log 4 +^ 



Ti 



1 ai 

2 l °^ (12) 



1 (<4+<4„ + g T a )(<4 + 4, 

2 ° g 4 ( aTi+(TT2 )2 



and 



= Ilog^i + i log i>{D x ,D 2 ,D 3 ). (13) 

o / v gn i 1n r K + 4 +4 1 )K + 4 2 ) 
2 ^ q (ct Ti + (j T2 y 

= ilog^ + ilogV(Di,Da,I>3), (14) 

Jo J 2 

- ,i5) 

Henceforth we shall assume that for fixed (Di, D2, -D3), o^,. (z = 0,1,2) are uniquely determined by (|9jl and 
( 1 1 01 . and consequently lZ G (Ui, U2) is given by di Q , Since only the optimal MD coding scheme is of interest, the 
sum-rate R\ + R2 should be minimized with respect to the distortion constraints (Di, D 2 , D3), i.e., (R±, R2) must 
be on the dominant face of 1Z g (Ui, 1/2). Thus for fixed (D\, D 2 , D3), 

Ri + R2 = \ log + ~ log^pi, D 2 , D 3 ). (16) 

B. Successive Quantization for Gaussian Source 

If we view U\, U2 as two different quantizations of X G and let Ui — X G and U 2 — X G be their corresponding 
quantization errors, then it follows 

E[(C/i -X G )(U 2 -X G )] = E[(T +T 1 )(T a + T 2 )] 

2 




which is non-zero unless D3 = (l/-Di + I/-D2 — ■ The existence of correlation between the quantization 

errors is the main difficulty in designing the optimal MD quantization schemes. To circumvent this difficulty, U\ 
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and U 2 can be represented in a different form by using the Gram-Schmidt orthogonalization. It yields that 

B 1 = X G , 

B 2 = U X -E{U X \X G ) = U X -X G , 
B 3 = U 2 -E(U 2 \X G ,U x )^U 2 - ai X G -a 2 U x , 

where 

01 
a 2 

It can be computed that 

EB 2 2 
EBj 

Now consider the quantization scheme for vertex V x of TZ G (U X , U 2 ) (see Fig. |4}. R X (V G ) is given by 

R^Vf) = I(X G ;U X ) = I(X G ;X G + B 2 ). (22) 

Since U 2 = E(U 2 \X G , Ui)+B 3 , where B 3 is independent of (X G , U x ), it follows that (X G , U x ) -> E(U 2 \X G , U x ) - 
U 2 form a Markov chain. Clearly, HL(U 2 \X G , U x ) — > (X G ,[/i) — > t/ 2 also form a Markov chain since ¥.(U 2 \X G , U x ) 
is a deterministic function of (X , J7i). These two Markov relationships imply that 

I{X G , U x ; U 2 ) = I(E(U 2 \X G , U x ); U 2 ), 

and thus 

R2{v 2 G ) = i(x G ,u i; u 2 ) 

= m(U 2 \X G ',U X );U 2 ) 

= I{ ai X G + a 2 U x ;a x X G + a 2 U x +B 3 ). (23) 

Although the above expressions are all of single letter type, it does not mean that symbol by symbol operations 
can achieve the optimal bound. Instead, when interpreting these information theoretic results, one should think of 
a system that operates on long blocks. Roughly speaking, d22> and d23l imply that 

1) Encoder 1 is a quantizer of rate R X (V G ) whose input is X G and output is Ui. The quantization error is 
B2 = Ui — X , which is a zero-mean Gaussian vector with covariance matrix E,B 2 I n . 

2) Encoder 2 is a quantizer of rate R 2 (V G ) with input a x X. G + a 2 TJ x and output U2. The quantization error 
B3 = U2 — a x X° — a 2 XJi is a zero-mean Gaussian vector with covariance matrix ES| I n . 

Remarks: 



t Ti + a Tl <JT 2 



'T 



+ a. 



T T - ^T^T 2 



2 

To 



2 



4 



(18) 
(19) 



(20) 
(21) 
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1) Ui (or U2) is not a deterministic function of X G (or aiX G + a^Ui), and for classical quantizers the 
quantization noise is generally not Gaussian. Thus strictly speaking, the "noise-adding" components in Fig. 
0] are not quantizers in the traditional sense. We nevertheless refer to them as quantizers 7 in this section for 
simplicity. 

2) Ui is revealed to decoder 1 and decoder 3, and U2 is revealed to decoder 2 and decoder 3. Decoder i 
approximates X by Xj = o,Uj, i = 1, 2. Decoder 3 approximates X by X3 = /3{Ui + /32U2. The rates to 
reveal Ui and U 2 are the rates of description 1 and description 2, respectively. 

3) From Fig.|4] it is obvious that the MD quantization for V G is essentially the Gram-Schmidt orthogonalization 
of (X G ,Ui,U2). As previously shown in Section II, the Gram-Schmidt orthogonalization can be simulated 
by sequential (dithered) quantization. The formal description and analysis of this quantization scheme in the 
context of multiple descriptions for general sources will be given in Section V. 



Fig. 4. MD quantization scheme for Vj . 

C. Successive Quantization with Quantization Splitting for Gaussian Source 

Now we study the quantization scheme for an arbitrary rate pair (R G , R G ) on the dominant face of 1Z G (U\, Uz). 
Note that since the rate sum R G + R G is given by ( H6L (R G , R G ) only has one degree of freedom. 

Let U2 = X G + T a + T 2 + T 3 , where T 3 is zero-mean, Gaussian and independent of (X G , Tq,Tx, T 2 ). It is easy to 
verify that (X G , U\) — > U2 — > U' 2 form a Markov chain. Applying the Gram-Schmidt orthogonalization algorithm 

7 This slight abuse of the word "quantizer" can be justified in the context of ECDQ (as we will show in the next section) since the quantization 
noise of the optimal lattice quantizer is indeed asymptotically Gaussian; furthermore, the quantization noise is indeed independent of the input 
for ECDQ [37]. 
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to pf G ,l^,[/i), we have 

Bi = X, 

B 2 = U! 2 -E{U! 2 \X G ) = U! 2 -X G , 

B 3 = U 1 -E(U 1 \X G ,U^) = U 1 -b 1 X G -b 2 Ui 

where 



61 = — 2 — I — 2 — I — 2 — ' ( 24) 

CT T + CT T 2 + CT T 3 

62 " 4 + 4 2+ 4 3 - (25) 



The variances of B 2 and B3 are 



EB| = 4 o+ 4 2 + 4 3 , (26) 
E £2 <4 K + ^T 2 ) 2 + 4 3 (<4 + <4± ) 

3 4 o + 4, + 4 3 

Since Z7i = E([/i2|X G £/£)+-B 3) where H 3 is independent of (X G , U^), it follows that (X G ,l^) -^E(Ui\X G ,U^) -» 
Ui form a Markov chain. Clearly, E(J7i|X G , C/^) -> (X G ,l^) -> Ui also form a Markov chain because E(Ui \X G , U^) 
is determined by (X G , U 2 ). Thus we have 

I(X G , XJ' 2 , Ui) = I(nUi\X G , U' 2 ); Ui), 

and this gives 

R G = I(X G ,U^;U 1 ) 

= I(E(tfi|X G ,C^);tfi) 
1 , EC/, 2 

log ; 



2 °ES| 

1 (q| + <4 + 4 t ) (4 + of 2 + 4 3 ) 

2 ° g 4 o (a Tl + a T2 ) 2 + 4 3 (4 o + a 2 , ) 



Hence 4 is uniquely determined by 

2 _ 4>T t + <rr 2 ) 2 2 2fli - (4 + 4 2 )04 + 4 + <4Q nR , 

^ +(7 2 o+4i _ 2 2H l((7 2 o+4i) • ^ 

We also can readily compute 

R G = I(X G ;U 1 )+I(X G :U 2 )+I(U 1 ;U 2 \X G )-R G 

1 [<4„ (ffTx + (TT 2 ) 2 + <4 3 (<4„ + 4, )] (4 + o\ a + 4 2 ) 

2 ° g ~ < (4 + 4 2 + <4 3 ) + <7T 2 ) 2 
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and 



^ 1 (cr \ + a% + a% ) (o% + a% ) „ 

4 =0 - o 1o § -2-7- — \2 - Ri(v 2 ), 

#2^=0 - » log * 2 g-5 = ^2°), 

3 Z (7 To -t- CT T2 

flflo*=oo = »log ^ 2 1" 2 Tl ^l0^), 
3 Z (T Tq -+- (T Ti 



^ 1 (<T v + Ct +4 ) (4 +4) 
#2 la?, =oo - 77 io g Tl ; V2 - H 2 {V 1 ). 



7T3= °° 2 * ^oK.+aTj 2 

Hence, as a^ 3 varies from to oo, all the rate pairs on the dominant face of TZ G {U\, U 2 ) are achieved. 
For rate pair (R G , R G ), we have 

R G = I(X G ,U^U 1 ) = I(E(U 1 \X G ,U^);U 1 ) 

= H^X + b 2 U 2 ;b 1 X G + b 2 U 2 + B 3 ), (29) 

R G = I(X G ;U' 2 )+I{X G ,U 1 ;U 2 \U I 2 ) 

= I(X G ;X G + B 2 )+I{X G ,U 1 -U 2 \U' 2 ). 

To remove the conditioning term U 2 in I(X G , U\;U 2 \U 2 ), we apply the Gram-Schmidt procedure to (U 2 ,X G , U\,U 2 ). 
It yields 

Si = U' 2 , 

B 2 = X-E(X G \B 1 )=X G -b 3 B 1 , 

B 3 = U 1 -E(U 1 \B 1 )-E(U 1 \B 2 ) = U 1 -b 4 B 1 -b 5 B 2 , 
3 

B 4 = U2-Y, E (U2\B i ) = U 2 -beB 1 -b 7 B 2 -b s B 3 , 



where 



1=1 



63 = a|+4 o +4 2+ 4 3 ' (30) 
h — 

4 — 2 1 2 . 2 1 2 ' 

0% +4 3 +a Tl cTT 2 
05 — — § — ; — 2 — ; — 2 — > ^ iZ > 

<4„ + 4 2 + 4, 
_ p| + 4 o + 4 2 

a 2 

6 7 - 2 T 2' 2 ■ 04) 

6 8 = °"t 3 (^Tq ~ a T 1 &T 2 ) 



DRAFT 



21 



The following quantities are also needed 

, :7 f = 4K,+4 2 +4 3 ) 

2 A + <4 + 4 2 + 4 3 ' 

v77 2 gjb + g T 2 ) 2 + 4 3 « + 4 t ) 

3 - 9 , „2 T ^2 



CT To + cr T2 + (T T3 



EB 4 = 2 T3V x 2 \ ^- - b z 7 EB 2 - b 2 8 EB 3 



■2 , 9„t;2 



^X+^To + tT T 2 + °T 3 



4 3 (<4 + 4b + a M. 44 3 

4 3 (4 + 4 + 
4 3 (4 ~ OT^TzY 



4 + 4n + 4, + Or, (4 + 4n + 4, + 4, ) (4n + 4, + °? 



X T "To ^ U T 2 ^ U T 3 \ U X ^ "To T " Ta T "T 3 A"T ^ "T 2 T U T 3 J 
4 l„2 „_ „_ \1 

(36) 



Wt {<*Ti + CT T 2 ) 2 + <4 S (4 + 4 X )] (4 + 4 2 + 4 3 ) ' 

Now write 

I(X G ,U i; U 2 \U^ = I(b 3 B 1 +B 2 ,b 4 B 1 +b 5 B 2 + B 3 ;b 6 B 1 +b 7 B 2 + b 8 B 3 + B 4 \B 1 ) 

= I(B 2 , b 5 B 2 + B 3 ; b 7 B 2 + b 8 B 3 + B 4 \B t ) . 

Since B\ is independent of (B 2 , B 3 , B 4 ), it follows that 

I(B 2 , b 5 B 2 + B 3 ; b 7 B 2 + b 8 B 3 + B 4 |Si) = I{B 2 ,b 5 B 2 + B 3 ;b 7 B 2 + b 8 B 3 + B 4 ). 

The fact that B 4 is independent of (B 2 ,~B 3 ) implies that (B 2 ,b 5 B 2 + ~B 3 ) — > b 7 B 2 + b 8 B 3 — > & 7 £F 2 + 6 8 S 3 + U 4 
form a Markov chain. This observation, along with the fact that b 7 B 2 + b 8 B 3 is a deterministic function of 

(B 2 ,b 5 B 2 + B 3 ), yields 

l(B 2 ,b 5 B 2 + B 3l b 7 B 2 + b 8 B 3 + B 4 ) = I(b 7 B 2 + b 8 B 3 ;b 7 B 2 + b 8 B 3 + B 4 ). 

Hence 

tig = I{X G ; X G + B 2 ) + I{b 7 B 2 + b 8 B 3 ;b 7 B 2 + b 8 B 3 + B 4 ). (37) 

Moreover, since 

b 7 B 2 + b 8 B 3 = {b 7 ~b 5 b 8 )X G + b 8 U 1 + {b 3 b b b 8 -b 3 b 7 -b 4 b 8 )U 2 \, (38) 
b 7 B 2 + b s B 3 + B 4 = U 2 ~b 6 U 2 . (39) 

it follows that 

R G = I(X G ;X G + B 2 )+l((b 7 -b 5 b 8 )X G + b s U 1 + (b 3 b 5 b 8 -hb 7 -b 4 b 8 )U! 2 - 1 U 2 -b (i U! i ) . (40) 

Let b\ = bi, b 2 = b 2 , b 3 — b 7 ~ b$b 8 , b 4 = b 8 , 65 — b 3 b$b 8 — b 3 b 7 — b 4 b 8 and &g = b$. Then ( I29> and J40i can be 
simplified to 

R G = I(hX G + b 2 U' 2 ;U 1 )=I{b* l X G + b* 2 U' 2 -b\X G + b* 2 U 2 + B 3 ), (41) 
R G = I{X G ;U' 2 ) + l{b* 3 X G + b* 4 U 1 + b* 5 U 2 ;U 2 ~b* 6 U! 2 ) 

= I(X G ;X G + B 2 )+l(b*X G + b* 4 U 1 +b* 5 U 2 ;b* 3 X G +b i U 1 +blU' 2 + B A ) . (42) 
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Equations (1411 and 1421 suggest the following optimal MD quantization system (also see Fig. [5}: 

1) Encoder 1 is a quantizer of rate Rf with input 6^X G + 6 2 U' 2 and output Ui. The quantization error B3 = 
Ui — fe^JX — &2U2 is Gaussian with covariance matrix E_B|/„. 

2) Encoder 2 consists of two quantizers. The rate of the first quantizer is R 2l - Its input and output are X G 
and U' 2 respectively. Its quantization error B 2 = U' 2 — X G is Gaussian with covariance matrix EB 2 I n . The 
second quantizer is of rate R 22 . It has input 6gX G + b\\J\ + bl\J' 2 and output U 2 — b§XJ' 2 . Its quantization 
error B4 is Gaussian with covariance matrix EB±I n . The sum-rate of these two quantizers is the rate of 
encoder 2, which is R§. Here R§ A = I(X G : U' 2 ), and R§ 2 = R§ - R§ >v 



1) Ui is revealed to decoder 1 and decoder 3. U 2 and U2 — bgXJ' 2 are revealed to decoder 2 and decoder 
3. Decoder 1 constructs X G = aiUi. Decoder 2 first constructs U 2 using U 2 and U 2 — &gU 2 , and then 
constructs X G = a 2 \J 2 . Decoder 3 also first constructs U2, then constructs X3 = ftUi + $2^2- It is clear 
what decoder 2 and decoder 3 want is U2, not U 2 or U 2 — bgXJ' 2 . Furthermore, the construction of U 2 can 
be moved to the encoder part. That is, encoder 2 can directly construct U 2 with U 2 and U 2 — 6gU 2 ; then, 
only U 2 needs to be revealed to decoder 2 and decoder 3. 

2) That B4 is independent of (X G , Ui, U' 2 ) and (B2, B3) is a deterministic function of (X G , Ui, U 2 ) implies 
that B4 is independent of (B 2 ,B3). 

3) The MD quantization scheme for (Rf,R 2 ) essentially consists of two Gram-Schmidt procedures, one 
operating on (X G ,U 2 ,Ui) and the other on (U 2 , X G , Ui, U2). The formal description and analysis of 
this scheme from the perspective of dithered quantization is left to Section V. 



Remarks: 




Fig. 5. MD quantization scheme for (iiP , i?!?). 



D. Discussion of Special Cases 



Next we consider three cases for which the MD quantizers have some special properties. 
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1) The case D$ = (1/Di + I/D2 — 1; For this case, we have 

(R 1 (V 1 %R2(V 1 G )) = ( Rl (V 2 %R 2 (V 2 G )) = Ql g^,ilog^) , 

which is referred to as the case of no excess marginal rate. Since the dominant face of 1Z (Ui, U 2 ) degenerates to 
a single point, the quantization splitting becomes unnecessary. Moreover, (I17> gives E(Ui — X G )(U 2 — X G ) = 0, 
i.e., two quantization errors are uncorrelated (and thus independent since (X G ,Ui,U 2 ) are jointly Gaussian) in 
this case. This further implies that U\ — > X G — > U 2 form a Markov chain. Due to this fact, the Gram-Schmidt 
othogonalization for (X G , U\, U 2 ) becomes particularly simple: 

B x = X G , 

B 2 = Ui -E(f7i|Jf) = Ui -X G , 

B 3 = U 2 -E{U 2 \X,U 1 )=E(U 2 \X G ) = U 2 -X G , 

and 

EB> = a 2 To +a 2 Tl , (43) 
EBl = a 2 To +a 2 T2 . (44) 

The resulting MD quantization system is 

1) Encoder 1 is a quantizer of rate I(X G ; U\) whose input is X G and output is Ui. The quantization error B2 
is (approximately) Gaussian with covariance matrix EB 2 I n . 

2) Encoder 2 is a quantizer of rate I(X G ;U 2 ) with input X G and output \J 2 . The quantization error B3 is 
(approximately) Gaussian with covariance matrix EB 2 I n . 

So for this case, the conventional separate quantization scheme [31] suffices. See Fig. [6] 



B 2 "i 




Fig. 6. Special case: D 3 = (1/Di + 1/D 2 - 
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2) The case -D3 = D\ + D 2 — cr x : For this case, we have 

Ri{Vf) + R 2 (V 1 G ) = i?x(y 2 G ) + R 2 (V 2 G ) = \ log ^f, 

which corresponds to the case of no excess sum-rate. Since D3 = D± + D 2 — <J 2 X implies a\ — D\ = D 2 — D3 
and a\ — D 2 = D\ — D%, it follows that 



D 3 <4 // Dia 2 



A" 



; x 



.o^-Di o\-D ?> ) \o\-D 2 o\ - D 3 



D 2 a\ 



3 X 



'X 



'X 



D 3 a 2 x - D 



(45) 



Since U\ and U 2 are jointly Gaussian, (145 ) implies U% and are independent. This is consistent with the result 
in [6] although only discrete memoryless sources were addressed there due to technical reasons. The interpretation 
of (1451 is that the outputs of the two encoders (/quantizers) should be independent. This is intuitively clear because 

otherwise these two outputs can be further compressed to reduce the sum-rate but still achieve distortion D3 for the 

2 

joint description. But that would violate the rate distortion theorem, since \ log ^ is the minimum Inadmissible 
rate for the quadratic Gaussian case. 



2 IU & D 2 




achievable with timesharing 



2 iu & Di 



|l°g^ Rx 



Fig. 7. Special case: D3 = D\ + D2 — a\. 



Now consider the following timesharing scheme: Construct an optimal rate-distortion codebook of rate | log ^ L 
that can achieve distortion D A . Encoder 1 uses this codebook a fraction 7 (0 < 7 < 1) of the time and encoder 2 
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uses this codebook the remaining 1 — 7 fraction of the time. For this scheme, the resulting rates and distortions are 

2 1 2 

given by Ri = ^log^, R 2 = ±^log^|, D x = 7D3 + (1 - D 2 = (1 -7)1)3 + 7< J x. and D a- Conversely, 

for any fixed D{ and D 2 with D\ + D| = o"x + ^3> tnere exists a 7* £ [0, 1] such that D\ = j*D 3 + (1 - 7*)cr x , 
DJj = (1 — 7*)-D3 + 7*<7x- The associated rates are R\ = 3g- log -jjj and i?2 = — jp" 1°S TT- So, the timesharing 
scheme can achieve any point on the dominant face of the rate region for the special case D 3 = D\ + D2 — c x 
(See Fig. 0. Specifically, for the symmetric case where D\ = D 2 = \{D 3 + <r x ), we have 7* = | and i?i = 

^2 = ll0 g7 J. 

3) The symmetric case D% = D2 = R>\ 2 ' The symmetric case is of particular practical importance. Moreover, 
many previously derived expressions take simpler forms if D\ = D 2 - Specifically, we have 

2 2 D 3 a x A 2 



a x — D12 a x — u 3 



and 



U X A 

ot\ = 0L2 — —5 5 5 — = a, 



'X t ^ To -r "T12 



The coordinates of and become 



RAVf) = R 2 (V 2 G )= 1 -\o g ^L, 

^ >->V2. 



R2(V 1 G ) = R 1 (V 2 G ) = -log- 



2 b AD 3 (a 2 x - D 12 ){D 12 - D 3 )' 
The expressions for Rf and i?Sf can be simplified to 



R G = 1 l 0g ^ + gjb + g T 12 )( g T + <4 12 + 4 3 ) 

2 4 4 4 12 +4 3 (4 +4 12 
2 44 o 4 i2 (4 o +4 i2 +4 3 ) 



R 2 - 2 l0g 1.-2 ,, T 2 .... , r 2 ~T7 

To keep the rates equal, i.e., Rf — R 2 , it must be true that 



22 2/2 2 \ /2 2 2 

4ct To CT Ti 2 + a T 3 { a T + a T 12 ) = 2a T VT 12 (e To + °T 12 + a T 3 

& (cr Tn - VT 12 ) 2 (<?T 3 ~ 2cr T O-T 12 ) = 0. 



If (Jt 7^ ctt 12 , then 



c4 3 = 2cr To cr Tl2 (46) 
o / D^x ( D^ 2 X DWx \ 

^ a x -D 3 {a x -D 12 <J X - D 3 J ' (4/) 

If cjt = <Jt 12 > men 

Ri(V G ) = R 2 (V G ) = R? = R G = R^V?) = R 2 (V 2 G ), V4 3 g ft+ 
i.e., (i? G , J?2 ) is not a function of <7j, . 
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V. Optimal Multiple Description Quantization System 

In the MD quantization scheme for the quadratic Gaussian case outlined in the preceding section, only the second 
order statistics are needed and the resulting quantization system naturally consists mainly of linear operations. In 
this section we develop this system in the context of the Entropy Coded Dithered (lattice) Quantization (ECDQ) for 
general sources with the squared distortion measure. The proposed system may not be optimal for general sources; 
however, if all the underlying second order statistics are kept identical with those of the quadratic Gaussian case, 
then the resulting distortions will also be the same. Furthermore, since among all the i.i.d. sources with the same 
variance, the Gaussian source has the highest differential entropy, the rates of the quantizers can be upper-bounded 
by the rates in the quadratic Gaussian case. At high resolution, we prove a stronger result that the proposed MD 
quantization system is asymptotically optimal for all i.i.d. sources that have finite differential entropy. 

In the sequel we discuss the MD quantization schemes in an order that parallels the development in the preceding 
section. The source {X(t)}'^L 1 is assumed to be an i.i.d. random process (not necessarily Gaussian) with KX(t) = 
and EX 2 (t) = a\ for all t. 

A. Successive Quantization Using ECDQ 

Consider the MD quantization system depicted in Fig. [8] which corresponds to the Gaussian MD coding scheme 
for V{ . Let Qi, ra (-) and Q 2 , n (-) denote optimal ri-dimensional lattice quantizers. Let Zi and Z 2 be n-dimensional 
random vectors which are statistically independent and each is uniformly distributed over the basic cell of the 
associated lattice quantizer. The lattices have a "white" quantization noise covariance matrix of the form of /„ = 
EZ.^Zf , where of is the second moment of the lattice quantizer Q ,:.„(•), i = 1, 2; more specifically, let o~\ = EfJf, 
<j| = E£>|, where :..l!j and E£>| are given by i20\ and J21> . respectively. Furthermore, let 

Wi = Q 1)n (X + Zi)-Zi, 

W 2 = Q 2 , ll {a 1 X + a 2 W 1 +Z 2 )-Z 2 , 

where a\ and a 2 are given by (1181 and (I19> . respectively. 

Theorem 5.1: The first and second order statistics of (X, Wi,W 2 ) are the same as the first and second order 
statistics of (X G ,U 1; U 2 ) in Section IV. 

Remark: The first order statistics of (X, W 1 ,W 2 ) and (X G ,Ui,U 2 ) are all zero. Actually all the random 
variables and random vectors in this paper (except those Sections I and III) are of zero mean, so we focus on the 
second order statistics. 

Proof: The theorem follows directly from the correspondence between the Gram-Schmidt orthogonalization 
and the sequential (dithered) quantization established in Section II, and it is straightforward by comparing Fig. |4] 
and Fig. [8] Essentially, X, Zi and Z 2 serve as the innovations that generate the first and second order statistics of 
the whole system. ■ 
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Ot2 

Fig. 8. Successive quantization. 



By Theorem 15.1 



-EHX-^W.H 2 = - 

n n 

2 

1, 



-E||X G - ajUjI 2 = Di 



-E 



-E 



i=i 



Da 



Let Nj be an rt-dimensional random vector distributed as — Zj, £ = 1,2,3. By property 2) of the ECDQ, we 
have 

H(Q hn (X + Z 1 )\Z 1 ) = 7(X;X + N!) 

= /i(X + Ni)-/i(Ni), 

H(g 2 ,„(a 1 X + a 2 W 1 + Z 2 )|Z 2 ) = J(aiX + a 2 W i; a t X + a 2 Wi + N 2 ) 

= h(a 1 X + a 2 W 1 + N 2 )-h(N 2 ). 

Thus, we can upper-bound the rate of <2i, n (-) (conditioned on Zi) as follows. 

Ri = iff(Qi, n (X + Zi)|Zi) 
n 

= i/i(X + Ni) - i/i(Ni) 

n n 

= -h(W 1 )--h(N 1 ) 

n n 



log [27r e (a^+EB 2 2 )] 



1 ES| 

2 G° pt 



where the inequality follows from Theorem l5.1l and the fact that for a given covariance matrix, the joint Gaussian 
distribution maximizes the differential entropy. Similarly, the rate of Q 2i „( ) (conditioned on Z 2 ) can be upper- 
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bounded as follows. 



R2 = -H(Q 2 (a 1 X + a 2 W 1 +Z 2 )\Z 2 ) 

n 

= ~h(aiX + a 2 W 1 +N 2 )--h(N 2 ) 
n n 

= -h(W 2 ) - i/i(Ni) 
n n 

< -h(U 2 ) - ift(Ni) 
n n 

= l - log [2ne{a\ + a\ + a% )] - \ log ^ 

= i? 2 (^ G ) + ilog(27reGr)- 
Since G°f -^^asn^oo, we have i?i < Ri(Vf) and i? 2 < R 2 {V 1 G ) as n -> 00. 

B. Successive Quantization With Quantization Splitting Using ECDQ 

Now we proceed to construct the MD quantization system using ECDQ in a manner which corresponds to that 
for the Gaussian MD quantization scheme for an arbitrary rate pair (Rf , R 2 )- 

Let Q* n ('), Q 2 „(•), and Qs n {') denote optimal n-dimensional lattice quantizers. Let Z*,Z 2 , and Z3 be n- 
dimensional random vectors which are statistically independent and each is uniformly distributed over the basic 
cell of the associated lattice quantizer. The lattices have a "white" quantization noise covariance matrix of the form 
<J* 2 I n = KZ*Z* T , where a* 2 is the second moment of the lattice quantizer Q* n (-), i — 1,2,3; more specifically, 
let af = EBf, af = ESf, and af = EB 2 4 , where ES|, EB% and Eb] are given by (|27j and 03l 

respectively. Define 

= QlJX + ZD-Zl, 
W? = Q5,„(6JX + b* 2 W' 2 + Z* 2 )-Z* 2 , 

A = QirM* + + b' 5 W 2 + ZD - zi 
W 2 = A + 6*W 2 . 

The system diagram is shown in Fig. [9] 

Theorem 5.2: The first and second order statistics of (X, Wi, W 2 , W 2 , A) equal the first and second order 
statistics of (X G , Ui, U 2 , U' 2 , U 2 - b 6 U 2 ) in Section IV. 

Proof: By comparing Fig. [5] and Fig. [9] it is clear that the theorem follows from the correspondence between 
the Gram-Schmidt orthogonalization and the sequential (dithered) quantization. The following 1-1 correspondences 
should be emphasized: B 2 and — Z\, B 3 and — Z 2 , B4 and — Z%. X, Z\, Z 2 and Z3 are the innovations that 
generate the first and second order statistics of the whole system. ■ 
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It follows from Theorem 15.21 that 

1 

n 

1 



-E 



X-OjWi 



I 2 1 



E ||X G - a,-U< 



Di 



4 =1,2, 



-E 



-E 



Let N* be an n-dimensional random vector distributed as — Z*, i = 1,2,3. By property 2) of the ECDQ, we 
have 



H(Q* hn (X + Z1)\Z$) = /(XjX + NJ) 

= ft(X + NJ)-fc(Ni), 

H(Q5 in (6ix + ^wi + z5)|z5) = i(bix + b;w' 2 ;bix + b;w' 2 + N;) 

= h(blX + b* 2 W' 2 + N* 2 )-h(N* 2 ), 

H(Q* 3n (bix + biwt + b;w> 2 + z*)|z*) = i(b* 3 x + biwt + biw' 2 -,b* 3 x + + b* 5 w' 2 + N*) 
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Thus we can upper-bound the rate of Q 2 „(•) (conditioned on Z 2 ) as follows. 



= -H{Q^ n {b\X + b* 2 W 2 + Z* 2 n )\Z* 2 ) 
n 

= U(btX + b* 2 W' 2 + N* 2 )-h(N* 2 ) 

< i/i(Ui) - -hCNZ) 
n n 

1 



- log [2ire(a 2 x + u\ a + u\ x )] - - log ^| 

R c i 



1 



" ; ' ^log(27reGr), 



(48) 

where the inequality follows from Theorem 15.21 and the fact that for a given covariance matrix, the joint Gaussian 
distribution maximizes the differential entropy. 

Similarly, the sum-rate of Q\ „(•) (conditioned on Zf) and Q* A „(■) (conditioned on Z|) can be upper-bounded 
as follows. 



R 2 



-H(Ql tn (X + Z* 1 )\Z* 1 ) + -H(Ql nV , A 
n n 



< 

(a) 



1 



1 



X + 65Wi + 6SWi + ZS)|Z5) 
1 



/i(X + N*) - -/i(NJ) + -h(b* 3 X + b* 4 W^ + b* 5 W 2 + N 



h(N* 3 ) 



h(W 2 ) - -h(Sl) + -h(A) 
n n 

h(V' 2 ) - -h(Nt) + -h{XJ 2 



1 



n 
1 



n 

6eU' 2 ) 



h(NS) 



/i(U 2 ) - -h(Nt) + -h(b 7 B 2 + b 8 B 3 + B 4 



- log [2*e(o% + a 2 Ta + 4 2 

1 , EB 4 2 
log r 

2 G° 



■4 3 )] 



1 ES| 

2 °^ G° pt 



^log 



27re(6?EB2 



(49) 



b\W 2 instead of W 2 



00, it 



= R§ + log (2neG^ ) , 

where (a) follows from ( 1391 . Remark: Since the decoders only need to know W2 = A 
and A separately, we can actually further reduce R 2 to (Wa|Z*, Z 2 , Zf). Since G° pt — ► ^ as 
follows from l|48} and l|49} that i?i < , i? 2 < -Rif as 71 ~* 00 . 

For the special case when D3 = (I/-D1 + I/-D2 — l/^x) 1 ' tne MD quantization systems in Fig. [8] and Fig. [9] 
degenerate to two independent quantization operations as shown in Fig. ^3 The connection between Fig. and 
Fig. [5] is apparent. 

The above results imply that for general i.i.d. sources, under the same distortion constraints, the rates required 
by our scheme are upper-bounded by the rates required for the quadratic Gaussian case. This further implies our 
scheme can achieve the whole Gaussian MD rate-distortion region as the dimension of the (optimal) lattice quantizers 
becomes large. 
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Fig. 10. Separate quantization. 
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C. Optimality and An Upper Bound on the Coding Rates 

Define Q out such that (Ri, R 2 , Di, D 2 , D 3 ) e Q ou t if and only if 

1 P x 
Ri > 2 log "D~' z = 1 ' 2 ' 

#1+^2 > ^Iog^ + ^Iog0(Di,D2,I>3), 

where 

1, D 3 <D 1 +D 2 



<j>(Di,D 2 ,D 3 ) = I 



PxDs 



(p x -D 3 y 

— o.w. 



(Px-D ;i y-[y/(P x -D 1 )(P x -D2)-y/(D 1 -D 3 )(D 2 -D ;i )]2 ' 

and P x = 2 2h ( x ) /2-rre is the entropy power of X. It was shown by Zamir [11] that for i.i.d. sources with 
finite differential entropy, Q out is an outer bound of the MD rate-distortion region and is asymptotically tight 
at high resolution (i.e., Di,D 2 ,D 3 — > 0). Again, we only need consider the case D\ + D 2 — Px < As < 
+ — -p^j .At high resolution, we can write 

\ log^(A, A, A) = \ log WDi _ D3 P + X VD2 _ D3)2 + o(D. 

where o(l) -> as A, A, A3 -> 0. 

The following theorem says our scheme is asymptotically optimal at high resolution for general smooth i.i.d. 
sources. 

Theorem 5.3: The region 

Ri > Jlog^ + Jlog(27reG^) + o(l), i= 1,2, 



A 



Rl+fl2 > ^ log ft + ^ lo, _ - ft - _ + | log (2» eGr ) + ■>(!) 
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is achievable using optimal n-dimensional lattice quantizers via successive quantization with quantization splitting. 

Proof: See Appendix II. ■ 
Remark: 

1) As Dt, D 2 , D 3 — > and n — > 00, the above region converges to the outer bound and thus is asymptotically 
tight. 

2) The sum-rate redundancy of our MD quantization scheme (i.e., successive quantization with quantization 
splitting) is at most three times the redundancy of an optimal n-dimensional lattice quantizer in the high 
resolution regime. It is easy to see from d48l and j49l > that for the Gaussian source, this is true at all 
resolutions. Specifically, for scalar quantizers, we have G° pt = yj, and thus the redundancy is | log This 
actually overestimates the sum-rate redundancy of our scheme in certain cases. It will be shown in the next 
section that for the scalar case, the redundancy is approximately twice the redundancy of a scalar quantizer 
at high resolution. 

3) The successive quantization with quantization splitting can be replaced by timesharing the quantization 
schemes for two vertices. Since for vertices it only requires two quantization operations, one can show 
that the redundancy of the timesharing approach is at most twice the redundancy of an optimal n-dimensional 
lattice quantizer. 

4) The reason that our MD quantization scheme is asymptotically optimal for all smooth sources is that the 
universal lossless entropy encoder incorporated in ECDQ can, to some extent, automatically exploit the real 
distribution of the source. 

The following theorem gives a single letter upper bound on the rates of our scheme at all resolutions as the 
dimension of the optimal lattices becomes large. 

Theorem 5.4: There exists a sequence of lattice dimensions ni,ri2,--- , such that 



lim sup — 

m — »oo 17>7n 

lim sup 

m — *oo Tim 



fr(X + N*) - -fc(NJ 



<h(X + Ni) - h (A G ) 



h(b* 1 X + b* 2 W / 2 + N* 2 )--h(N* 2 ) 
n 



<h(X + b* 2 N? + A G ) - h (A G ) , 



and 



lim sup 

m — >oo Tim 



1 



h(b* 3 X + btW? + b* 5 W> 2 + N* 3 ) - ^(N*) 
< h ((b* 3 + b\b\ + bp* 4 + bl)X + (b* 2 b* 4 + b* 5 )N? + b\N§ + A 3 G ) - h (A 3 G ) , 

where A-p - A/"(0,KB|), A G - A/"(0,ES|), Af - Af(0,E~Bl), and the generic source variable X are all 
independent. 

Remark: This theorem implies that as the dimension of the optimal lattices goes to infinity, the rates required by 
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our scheme can be upper-bounded as 

Ri < h (X + b* 2 Nf + Nff) - h (N?) 

R 2 < h(X + ATf ) - h (Ni) + h ((b* 3 + b\b% + b* 2 b\ + b%)X + {b* 2 b\ + b* 5 )N? + b\N$ + A/f) - h (N?) . 

By comparing the above two expressions with d4H and d42t . we can see that if X is not Gaussian, then R\ < Rf, 
R 2 < R%. 

Proof: See Appendix III. ■ 
As mentioned in Section II, by incorporating pre- and postfilters, a single quantizer can be used to sequentially 
perform three quantization operations instead of using three different quantizers. Let Q^(-) be an optimal n- 
dimensional lattice quantizers. The lattices have a "white" quantization noise covariance matrix of the form u* 2 I n , 
where a* 2 is the second moment of the lattice quantizer Without loss of generality, we assume a* 2 — a\ 2 , 

i.e., = Q* n (-)- We can convert Q* n { ) to Q* n (-) by introducing the prefilter and postfilter a*, where 

a* = i = 2, 3. Incorporating the filters into the coefficients of the system gives the system diagram shown in 
Fig.HU Here b[ = f , b> 2 = f , b' 3 = f , b\ = b' b = |, b> 6 = |, a[ = a,a*, a' 2 = a 2 at % = fta$, 

and (3' 2 = /?2 a 3- Although the quantizer Q* (•) can be reused, the dither introduced in each quantization operation 
should be independent. 




Fig. 11. MD lattice quantization with quantizer reuse. 

VI. The Geometric Interpretation of the Scalar Quantization Scheme 

In this section we give a geometric interpretation of our MD quantization scheme when undithered scalar 
quantization is used in the proposed framework. This interpretation serves as a bridge between the information 
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Fig. 12. Coding scheme using successive quantization in terms of quantization encoder and decoder. 



theoretic description of the coding scheme and the practical quantization operation. Furthermore, it facilitates 
a high-resolution analysis, which offers a performance comparison between the proposed quantization scheme 
and existing multiple description quantization techniques. Though only scalar quantization is considered here, the 
interpretation can also be extended to the vector quantization case. 

A. The Geometric Interpretation 

It is beneficial to clarify the definition of the encoder and decoder functions of a classical scalar quantizer. The 
overall quantization can be modeled to be composed of three components [61]: 

1) The lossy encoder is a mapping q : 1Z — > X, where the index set X is usually taken as a collection of 
consecutive integers. Commonly, this lossy encoder is alternatively specified by a partition of 1Z, i.e., the 
boundary points of the partition segments. 

2) The lossy decoder is a mapping q^ 1 : X — > 1Z', where 1Z' C 7Z is the reproduction codebook. 

3) The lossless encoder 7 : X — > C is an invertible mapping into a collection C of variable-length binary vectors. 
This is essentially the entropy coding of the quantization indices. 

The successive quantization coding scheme in Fig. [8] is redrawn in terms of quantization encoder and decoder 
in Fig. El The scaling factors otx, a,i, Pi and /?2 are absorbed into the lossy decoders. The lossless encoder 7, 
though important, is not essential in this interpretation and is thus omitted in Fig. El The lossy decoders in the 
receiver are mappings q^ 1 : X\ — ► TH\, q^ 1 '■ X<i — > 1Z' '2, and q^ 1 : I\ x X 2 — *• TZ' z, respectively; notice that the 
corresponding lossy encoders do not necessarily exist in the system. 

For simplicity, we assume the lossy encoder q a and generate uniform partitions of 1Z, respectively, while the 
lossy decoder q^ 1 takes the center points of the partition cells of q a as the reproduction codebook. Notice that 

8 Although the ECDQ-based MD scheme considered in the preceding section is certainly of practical value, we mainly use it as an analytical 
tool to establish the optimality of our scheme. In practice, it is more desirable to have a MD scheme based on low-complexity undithered 
quantization. 
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Fig. 13. The geometric interpretation of the partitions using successive quantization. 

function y = q a (q~ 1 (x)) is piecewise constant. A linear combination of x and y is then formed as s — a\X + a 2 y, 
which is then mapped by the lossy encoder q\, to an quantization index (ft(s). The task is to find the partition 
formed by these operations, and it can be done by considering a partition cell i, given by (sj, Sj+i], in the lossy 
encoder qb- 

In Fig. [D] this partition cell is represented on the (x, y) plane. For operating points on the dominant face of the 
SEGC region, it is always true that a\ a < ar 1 <JT 2 ^ which implies a 2 < [from J18i l. and thus the slope of the 
line a\X + a 2 y — s,: is always positive. It is clear that, given qb(s) — i, x can fall only into the several segments 
highlighted by the thicker lines in Fig. ^] i.e., into the set C s (i) — {x : a\X + a 2 q a (q~ 1 (x)) G ( s i> s i+i]}- The 
information regarding x is thus revealed to the lossy decoder q 2 x . In the lossy encoder q a , the information is 
revealed to the lossy decoder q^ 1 in the traditional manner that, when index j is specified, x is in the j-th cell, 
which is (xj,Xj+i]; denote it as C x (j) — (xj, Xj+{\. Jointly, the lossy decoder q^ 1 has the information that x is 
in the intersection of the two sets as C s (i) n C x (j). 

Now we briefly discuss the extension of this interpretation to the case of quantization splitting. The coding scheme 
in Fig. |5]is redrawn in Fig. [3] Some of the operations in Fig. |5]are absorbed into the lossy decoders. It can be 
observed that q a , q^ 1 and q^ play roles similar to those in Fig.^J thus, the geometric interpretation for successive 
quantization can still be utilized. Let s = b\x + b\q~ l (q a (x)) and define C s (j) — {x : b\x + b^q^ 1 (q a {x)) £ 
(sj, Sj+i)}, where (sj, Sj+i] is the j-th partition cell in the lossy encoder q^. The variable s is defined differently 
from that in successive quantization, but this slight abuse of the notation does not cause any ambiguity. 

Notice the index i = (i a , i c ) has two components, one is the output of q a , and the other is that of q c . In a sense, 
q a and q c are formed in a refinement manner. Thus, the lossy encoder q c and the lossy decoders q^ 1 and q^ 1 always 
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Encoder 



Fig. 14. Coding scheme using quantization splitting in terms of quantization encoder and decoder. 



have the exact output from q a , which in effect confines the source to a finite range. Thus, we need to consider only 
the case for a fixed q a (x) value. It is obvious that when q a (x) = io is fixed, {io) = Vi - Consider the linear 
combination of r = b^x + b\t + b^q~ 1 (q a (x)), where t = q^ 1 (lb(s))- It is similar to the linear combination of 
s = b*x + b2y, but with the additional constant term b§yi , when io is given. It can be shown that this constant term 
in fact removes the conditional mean such that E(r\q a (x) = i ) «0, and the lossy encoder q c is merely a partition 
of an interval near zero. Thus with q a (x) = io given, (ft,, q^ 1 and q c essentially adopt the same roles as q a , q~ x and 
qb, respectively, in Fig. [21 This implies a similar geometric interpretation again holds for the additional components 
in Fig. [21 since b\ = b 8 < and &| = b 7 - b 5 b 8 > 0. Define C xr (i a ,i c ) = {x : x £ (x ia ,x ia+1 ],r £ (r ic , r ic+ i}}, 
where (xi a ,Xi a +i] is the i a -th partition cell in the lossy encoder q a and (^,7*^+1] is the i c -th partition cell in the 
lossy encoder q c . Given the index pair = (i a ,ic:j)> tne j° mt lossy decoder q^ 1 is provided with information 
that x £ C s (j) n C xr (i a ,i c ). 



B. High-Resolution Analysis of Several Special Cases 

Below, the high-resolution performance of the proposed coding scheme using scalar quantization is analyzed 
under several special conditions. Of particular interest is the balanced case, where R\ = R2 = R and two side 
distortions are equal, D\ = D2; significant research effort has been devoted to this case. In the analysis that follows, 
simplicity is often given priority over rigor; this corresponds to the motivation to introduce this section, which is 
to provide an intuitive interpretation of the coding schemes. 

For the balanced case, it can be shown [21] that at high-resolution if the side distortion is of the form D\ = 
6<7^2 _2 ( 1 ~' ? ) i? , where < 77 < 1 and b > 1, the central distortion of an MD system can asymptotically achieve 

<j 2 x 2- 2R /2(b+Vb^T) r7 = 0; 



(50) 



a 2 2 -2R(i+ n )/ 4b 0< V <1. 

Notice the condition < 77 < 1 in fact corresponds to the condition that a 2 . ^> D\ and D\ 3> D3 at high rate. In this 
case, the central and side distortion product remains bounded by a constant at fixed rate, which is D3-D1 > — , 
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Fig. 15. Several special cases of the partition formed using successive quantization, (a) a\ = 2, C12 = —1, and the stepsize of q a is the same 
as that of q^,. (b) a± = 2, ci2 = — 1, but the stepsize of q a is much larger than that of q^. (c) When the stepsize of q a is much larger than that 
of q;,, by slightly varying a\ and 02, the two side distortions can be made equal. 



independent of the tradeoff between them. This product has been used as the information theoretical bound to 
measure the efficiency of quantization methods [19], [20], [22], [27], [30], [62]. For the sake of simplicity, we 
focus on the zero-mean Gaussian source, however, because of the tightness of the Shannon lower bound at high- 
resolution [11], the results of the analysis are applicable with minor changes for other continuous sources with 
smooth probability density function. 

1) High-resolution analysis for successive quantization: Consider using the quantization method depicted in Fig. 
1121 to construct two descriptions, such that D\ — D2, though the rates of the two descriptions are not necessarily 
equal. For the case a x ^> D\ and D\ 3> D3 at high rate, it is clear that ^S> o-\ x — 3> a\ . Thus a\ « 2 
and ct2 ~ — 1 [from (I18H . which suggests that the slope of the line a\x + a-^y = s, should be approximately 2 in 
this case. 

Next we consider the three cases depicted in Fig. ^] In Fig.^](a), cii = 2, 02 = —1 are chosen. By properly 
choosing the thresholds and the stepsize, a symmetric (between the two descriptions) partition can be formed. In 
this partition, cells C x (-) and cells C s (-) both are intervals. Furthermore, they form two uniform scalar quantizers 
with their bins staggered by half the stepsize. This in effect gives the staggered index assignment of [22], [63]. 
By using this partition, the central distortion is reduced to 1/4 of the side distortions. Notice that in this case the 
condition D\ ^> D3 does not hold, but choosing a\ = 2, 02 = — 1 indeed generates two balanced descriptions; 
this suggests that certain discrepancy occurs when applying the information theoretic results directly to the scalar 
quantization case. The high-resolution performance of the partition in Fig. ^] (a) is straightforward, being given 
by D\ = D2 ~ T2^a ~ TF^ _2iil<T ^' wnere tne second equality is true when entropy coding is assumed, and 
D3 ~ 3-D1 (also see [62]). 

In Fig. [21(b), the stepsize in qi,, which is denoted by Ah, is chosen to be much smaller than that of q a , which 
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is denoted as A a ; however, a\ = 2 and a 2 = — 1 are kept unchanged. In this case, the partition by q a is still 
uniform, and the performance of q^ 1 is given by D\ w A^ w Zp^2~ 2Rl a 2 . This differs from the previous case 
in that most of the cells C s (z) are no longer intervals, but rather the union of two non-contiguous intervals, when 
A a 3> A ; for a small portion of the C s cells, each of them can consist of three non-contiguous intervals, but when 
A a > A b , this portion is negligible and will be omitted in the discussion which follows. Furthermore, cell C s (i) 
approximately consists of two length A b /2 intervals whose midpoints are ^A a apart. The distortion achieved by 
using this partition in the lossy decoder q 2 is 



Intuitively, this says that the average distance of the points in the cell C s (i) from its reproduction codeword is 
approximately jA a , which is obviously true given the geometric structure of the cell C s (i). Note that D x and D 2 
are not of equal value. 

The rate of the second description is less straightforward, but consider the joint partition revealed to q^ 1 . This 
partition is almost uniform, while the rate of the output of q b after entropy coding is one bit less than that when 
the same partition is used in a classical quantizer, because each cell C s (i) consists of two local intervals instead 
of one as in the classical quantizer. Thus, 



It follows that an achievable high-resolution operating point using scalar quantization is given by (i?i , R 2 , Di , D 2 , D 3 ), 
where Di = ^2~ 2Rl al, D 2 = \D U D 3 = ^2- 2R *a 2 x ; by symmetry, the operating point {R 2 , #1 , D 2 , D U D 3 ) 
is also achievable. By time-sharing, an achievable balanced point is ( Rl + R2 ; R i+ R 2 ; |Z?i, ^Di,D 3 ). Obviously 
the central and side distortion product is ^( 2 ^) 2 2~ 2( - Rl+R2 ^a 2 , which is only 2.5 dB away from the information 
theoretic distortion product. However, time-sharing is not strictly scalar quantization, and later we discuss a method 
to avoid the time-sharing argument. 

In order to make Di = D 2 when A a » A b , the values of a 2 can be varied slightly. First, let A a be fixed such 
that Di(& ~ ^-2~ 2Rl a 2 ) and Ri are then both fixed. It is clear with stepsize A b fixed, as a 2 decreases 

from —1, the distortion D 2 increases. A simple calculation shows that when a 2 = —4/3,D 2 > D\\ thus, the desired 
value of a 2 is in (-4/3, -1), and we find this value to be a 2 — —1.0445. The detailed calculation is relegated to 
Appendix IV, where the computation of the distortions and rates of this particular quantizer also is given. By using 
such a value, it can be shown that an achievable high-resolution operating point is (R\, R 2 , D\, D 2 , D3), where 
D 1 = D 2 k, 2 ^2- 2R ^l and D 3 w 0.8974 • ^2- 2R2 a 2 x . The rates R 1 and R 2 usually are not equal, but the 
results derived here will be used to construct two balanced descriptions next. 

2) Balanced descriptions using quantization splitting: As previously pointed out, in the quantization splitting 
coding scheme should be chosen to be 2<tt <7t 1 when balanced descriptions are required; then ar x ^> &t„ 
implies crf^ > > a\ a . It follows that b\ k, 2, b 2 = -1, w 2, b\ w -1 and b* b w 3. We make the following 
remarks assuming these values. 



D 2 « (\A a ) 2 = \ Dl 



(51) 




(52) 
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• The conditional expectation E{r\q a (x) — i) is approximately zero, which implies only the case in which 
q~ 1 ((q(x))) = needs to be considered. This is obvious from the geometric structure given in Fig. [21(b) 
and the values of &*s. 

• The partition formed by q c does not improve the distortion D\ over q a . This is because the slope of the line 
b\x + b\t + b^Uig — ri on the (x,t) plane is given in such a way that it almost aligns with the function 
t = f(x). In such a case, the cell C xr (i a ,ib) consists of segments from almost every cell C s (j) for which 
C s (j) n {x : q a (x) — i a } 7^ 0. Intuitively, it is similar to letting the slope of a\X + a 2 y = s, ; have a slope of 1 
in Fig. [21(b), such that the distortion D 2 does not improve much over a 2 in the successive quantization case. 

With these two remarks, consider constructing balanced descriptions using scalar quantization for R x = R 2 as 
follows. Chose b* = 2 and b 2 = —1.0445 such that, without the lossy encoder q c , the distortions Di and D 2 are 
made equal. Denote the entropy rate of q a as R\ a and that of qi, as R 2 . Let 63 = 2, b\ — —1 but b\ — 2.9555 
such that E(r\q a (x) = i) is approximately zero. By doing this, 63a; + b\t + bt,yi = i\ on the (x,<)-plane aligns 
with the function t = f(x), and thus the remaining rate i?i — R\ a is used by q c to improve D3, but D\ and D 2 
are not further improved. Since q a and qb are both operating on high resolution, assuming i?i — Ri a is also high, 
then q c partitions each x £ C s (j) n C xr (i a ,i c ) into 2 Rl ~ Rla uniform segments, thus improve Do by a factor of 

2 -2(Jli-fli ) i 

Using this construction, we can achieve a balanced high-resolution operating point of (R\, Rx, D%, Di, Do) 
without time-sharing, where D 1 = ^2- 2R ^a 2 x and D w 0.8974 • 2 I£fL2- 2 ( 2Rl - Rl «\ Thus, when a 2 x > D 1 = 
D 2 ^> D3, the central and side distortion product is 2.596 dB away from the information theoretic distortion 
product. This is a better upper bound than the best known upper bound of the granular distortion using scalar 
quantization, which is 2.67 dB away from the information theoretic distortion product [22]; this previous bound 
was derived in [22] using the multiple description scalar quantization scheme proposed by Vaishampayan [19], 
[20] with systematic optimization of quantization thresholds. It should be pointed out that the results regarding the 
granular distortion also apply to other continuous source as in the approach taken in [22]. Thus for any sources 
with smooth pdf, this granular distortion can be 2.596 away from the Shannon outer bound which is tight at high 
resolution. 

C. Optimization of Scalar Quantization Scheme 

The analysis in the previous subsection reveals that for the scalar case the proposed coding scheme can potentially 
achieve better performance than the previous techniques based on scalar quantization [19], [20], [22]. However, 
for the proposed coding scheme to perform competitively at low rate with scalar quantization, better methods to 
optimize the quantizer should be used. Specifically, the following improvements are immediate: 

• Given the partition formed by the lossy encoders, the lossy decoder gj~ , q^ 1 and should optimize the 
reproduction codebook to be the conditional mean of the codecells. 

• The index i a and ib should be jointly entropy-coded instead of being separately coded, and such a joint 
codebook should be designed. 
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• The lossy encoder q c can be designed for each output index of q a , and thus operates adaptively. 

• The encoder partition should be better optimized; the design method for multi-stage vector quantization offers 
a possible approach [64]. 

These improvements currently are under investigation; a systematic comparison of these improvements is beyond 
the scope of this article and thus will not be included. 



We proposed a lattice quantization scheme which can achieve the whole Gaussian MD rate-distortion region. Our 
scheme is universal in the sense that it only needs the information of the first and second order statistics of the 
source. Our scheme is optimal for Gaussian sources at any resolution and asymptotically optimal for all smooth 
sources at high resolution. 

Our results, along with a recent work by Erez and Zamir [65], consolidate the link between MMSE estimation 
and lattice coding (/quantization), or in a more general sense, the connection between Wiener and Shannon theories 
as illuminated by Forney [66], [67]. 

Although the linear MMSE structure is optimal in achieving the Gaussian MD rate-distortion region as the 
dimension of the (optimal) lattice quantizers goes to infinity, it is not optimal for finite dimensional lattice quantizers 
since the distribution of quantization errors is no longer Gaussian. Using nonlinear structure to exploit the higher 
order statistics may result in better performance. 

We also want to point out that our derivation does not rely on the fact that the source is i.i.d. in time. The 
proposed MD quantization system is directly applicable for a general stationary source, although it may be more 
desirable to whiten the process first. 

Appendix I 

Gram-Schmidt Orthogonalization for Random Vectors 

Let Ti v denote the set of all n-dimensional 9 , finite-covariance-matrix, zero-mean, real random (column) vectors. 
H v becomes a Hilbert space under the inner product mapping 



For = (Xi, X 2 , • • • , Xm) t with X; e H v , i = 1, • • • , M, the Gram-Schmidt orthogonalization proceeds as 



VII. Conclusion 



(x,y; 



') = E(XY T ) :H v xH v -> U 



follows: 



Bi = Xx 




Note: 



e(x,bJ) 



can be any matrix in W 



if = 0. 



• E(BiBj) 



'This condition is introduced just for the purpose of simplifying the notations. 
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We can also write 



Bi — Xi, 

B 4 = Xi-i^Xi" 1 , i = 2,.--,M, 

where G R" x ' 1-1 '" is a matrix satisfying K^iK^i-i = . x *-i- When i^ x ;-i is invertible, we have 

= Ar x x i-iX~ 1 1 _ 1 . Here if x i-i is the covariance matrix of (Xi, •• • , Xi) T and , x <-i = E[X,-(Xi, • • ■ ,X,_ 
Again, a sequential quantization system can be constructed with Xi as the input to generate a zero-mean random 
vector Xf 1 = (Xi,--- ,Xm) t whose covariance matrix is also K^u. Assume Kn f = EBjBf is nonsingular 
for i — 2, • • • , M, Let Qi, n {-) be an n-dimensional lattice quantizer, i = 1, 2, • • • , L — 1. The dither is an 
n-dimensional random vector, uniformly distributed over the basic cell of Q,*, n , i = 1,2, • • • ,M — 1. Suppose 
(Xi, Zi, • • • , Zm-i) are independent, and EZ^Zf = i^B;, i = 1, 2, • • • , M. Define 

Xi - Xi, (53) 

X, = Qi-x, n (^i-iXi" 1 + Zi_i) - Zi_!, i = 2, • • • , M. (54) 

It is easy to show that Xf 1 and Xf 1 have the same covariance matrix. 

As in the scalar case, a single quantizer can be reused if pre- and post-filters are incorporated. Specifically, given 
an n-dimensional lattice quantizer Q n (-), let the dither TJ i be an n-dimensional random vector, uniformly distributed 
over the basic cell of Q n with nonsingular covariance matrix Kz> = EZ'^Z'f . Let Ai be an n x n nonsingular 
matrix 10 such that AiKz'Af — K& i+1 , i = 1, 2, • • • , M— 1. Suppose (Xi, Z' 1; • • • , Z' M _ 1 ) are independent. Define 



Xi 
X, 



Xi, 

A t -i 



2,- 



M. 



Qn^Ar^K^X, +Z' t _ 1 J-Z; ; _ 1 , 

It is easy to verify that X^ and Xf 1 have the same covariance matrix by invoking property 2) of the ECDQ. 
Here introducing the prefilter A^ and the postfilter Ai is equivalent to shaping Q(-) by Ai, which induces a new 
quantizer Qi,„(-) given by Qi,„(x) = AjQ^A^x). 

Suppose K-Q i is singular for some i, say i^B ; is of rank k with k < n. For this type of degenerate case, 
the quantization operation should be carried out in the nonsingular subspace of K^ t . Let = UAU T be the 
eigenvalue decomposition of K^. Without loss of generality, assume A = diag{Ai, • • • , Afc, 0, • • • , 0}, where A^ > 
for all i = 1, 2, • • • , k. Define Afc = diag{Ai, • • • , Afc}. Now replace the n-dimensional quantizer Qi-i : n{ ) in (I54> 
by a fc-dimensional quantizer Qi-i.k(-) and replace the dither Z;_i by a dither Z^_i which is a fc-dimensional 
random vector, uniformly distributed over the basic cell of Qi^\^ with EZi-iZ^ = Afc. Let 



X; 



l.k 



-l.k 



U T Ki_{X\- 



l,k 



Zi-i 



H-l 



10 Ai is in general not unique even if we view Ai and — Ai as the same matrix. For example, let _R" Z / = UiU^ be the Cholesky 

decomposition of K Z ' and K^ i+1 = U2U2 be the Cholesky decomposition of K^ i+1 , where Ui and U2 are lower triangular matrices. We 

can set Ai = JJ^U^ . Let K z r = ViAiV^ and -R"B i+1 = V2A2V 2 T be the eigenvalue decompositions of K z i and i\^B i+1 respectively, 
i _i 

We can also set t4 = ViA^ A-,^ 2 
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and we have 



X,; = U 



k+l,n/ 



U T Ki-{K.\ 1 is a column vector containing the first fc entries of J7 T A"i_iX^ 1 and l/ T ifi_iX^ 



where 

is a column vector that contains the remaining entries of U 1 K^i'X.]~ L . 



i-1 



k+l,n 



Appendix II 
Proof of Theorem I5.3I 



It is easy to verify that as D\, D 2 , D 3 — > 0, we have St — ► 1, 



a Tq +<? t . 
D 3 



1, and 



1, i = 1,2. 



Let erf. € 



0, M (j^W D i - A3 + \/A 2 - A3) 2 + As) , where M is a fixed large number. Clearly, of 3 -> as 
A,A2,A3^0. 

For the MD quantization scheme shown in Fig. [5] we have 
1 



Ri 



R* 



n H(Ql n (blX + b* 2 W' 2 + Z* 2 n )\Z* 2 ) 

= -h(X + b* 2 N* 1+ N* 2 )--h(N* 2 ) 
n n 

1 if 
= -fc(x + &2NI + N$)--log^| J 

= iff(Qi iTl (x + z;)|z?) + -fcq^x + 6jw! + b;w> 2 + z*)|z*) 

= i/i(X + N*) - i/i(NJ) + -h(b* 3 X + b* 4 W 7 > + b;W' 2 + N*) - i/i(N*) 

< -h(X + Nl)- -h(Nl) + -h(b 7 B 2 + b 8 B 3 + B 4 ) - -h(N%) 

n n n n 



—h(X + NT) — - log + - log 



27re(6^ES 2 + b 2 8 EB 3 + EB 4 ) 



1 . EBl 

- lOg T . 

2 Q°pt 



Since i/i(X + N^) = fc,(X) + o(l) and ~/i(X + & 2 N* +N 2 ) = fc,(X) + o(l) as D X ,D 2 ,D 3 -> 0, it follows that 



fli = 5 log ^ + 



1 



log- 



Px(A 2 + 4 3 ) 



1 



2 A3(VAi - A 3 + VA2 - D 3 y +a^ 3 Di 2 



— 2 

#2 < - \ log S + i log [ 2 7re(b 2 7 EB 2 2 + 6 2 Es1 + E£ 2 )] - i log ^ + o(l) 



1, Px 



1 A 3 (VAi - A 3 + y/D 2 - A3) 2 + oj ft 



So we have 



A 2 + 0^ 2 



i?l + i?2 < X log 



77 lo g 



A 3 (VA - D 3 + y/D 2 - A3) 



+ log(2 7 reG^ t ) + o(l). 



p2 



2 ° A 2 (VA - D 3 + y/D 2 - A3) 2 2 



+ -log(27reGr). 
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When er^ = 0, there is no quantization splitting and the quantizer ra (-) can be removed. In this case, we have 
Ri = ilog^ + ilog(27reO+ (l) 



When erf, = M 



§f (V-Di - Da + VD 2 - D 3 ) 2 + D 2 



we have 



where e(M) — » as M — > oo. Therefore, the region 

fli = ilog^ + ilog(27r e Gr) + e(M)+ (l), 

#2 = ilog^ + ilog(27r e Gr) + (l), 

is achievable. 

By symmetry, the region 

Ri = 5log^ + ilog(27reG^)+ (l), 

R2 = ilog^ + ilog(27reG°f t )+e(M) + (l), 

is achievable via the other form of quantization splitting. The desired result follows by combining these two regions 
and choosing M large enough. 

Appendix III 
Proof of Theorem I5.4I 

We shall only give a heuristic argument here. The rigorous proof is similar to that of Theorem 3 in [37] and thus 
is omitted. 

It is well-known that the distribution of the quantization noise converges to a white Gaussian distribution in the 
divergence sense as the dimension of the optimal lattice becomes large [37]. So we can approximate N* by Np, 
where Np is a zero-mean Gaussian vector with the same covariance as that of N*, i = 1, 2, 3. Therefore, for large 
n, we have 

-h(X + N* 1 )--h(N* 1 ) m -/i(X + Nf)- -WNf) 
n n n n 



= h (X + Nf) - h (/Vf) 
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h(b\ 



X + b* 2 W' 2 + N* 2 ) - -ft(Na) = ift(X + 6 2 'N* + N*,) - -h(N%) 

« ^(X + ftJNf + N 2 G )-Ift(N 2 G ) 



= ft (X + 6*iV G + N?) - ft (N?) , 



and 



-ft (b* 3 X + 6|W? + 6*,W 2 + N*) - -ft (N*) 

n \ J n 

= h ((bl + b{bt + b* 2 b\ + 6*)X + (b* 2 bt + 6*,)N* + b' 4 NT 2 + Ng) - hi (N*) 

« ift ((6J + 6*61 + 6*6^ + &s)X + (6^61 + 6J)Nf + 6!:N G + N G ) - \h (N 3 G ) 
= ft ((65 + b\b\ + b* 2 b* 4 + bl)X + (b* 2 bl + b* 5 )N? + b\N§ + A/ 3 G ) - ft (Af) . 

Appendix IV 

The calculation of scalar operating point using successive quantization 

Observe in Fig.^](c) that the value a 2 is slightly different from —1, such that a portion of the C s cells consist of 
three length i-A^ intervals which are approximately ^ 2 -/S. a apart (denote the set of this first class of cells as C' s ), 
while the other C s cells consist of only two length iA& which are also ^pA apart (denote the set of this second 
class of cells as C'J); the ratio between the cardinalities of these two sets is function of a 2 , which is approximately 
~4+3a 2 2 • Here we again ignore the cells C s whose constituent segments are at the border of q a partition cells, 
which is a negligible portion when A a ^> A&. The average distortion for each first class cell C s is approximately 
|(^|^-A a ) 2 , while the average distortion for each second class cell C s is approximately (| ■ ^pA a ) 2 . Thus, the 
distortion Z? 2 can be approximated as 

D 2 » (-3-3a 2 ).^(^A a ) 2 + (4 + 3a 2 )(i^A a ) 2 

= -^(5a 2 + 4)a 2 A a 2 (55) 
Id 

Notice that —3 — 3a 2 + 4 + 3a 2 = 1; thus, (—3 — 3a 2 ) is the percentage of the first class cells in all the C s cells. 
Letting D\ = D 2 = j^Aa' 2 , we can solve for a 2 ; the only real solution to this equation is a 2 = —1.0445. The 
distortion D 3 is approximately -^(^A;,) 2 , by using an almost uniform partition of stepsize ^A^. To approximate 
the entropy rate for q^, consider the rate contribution from the first class C s cells, namely 

c s {i)ec> 3 

« (3 + 3a 2 )log 2 (^A b )- E p(^ 1 W)^A b log 2 (p(<Z 2 - 1 W)) (56) 

c 3 (t)ec s 

where p(x) is the pdf of the source, and the second approximation comes from taking the percentage of the first 
class cells in all the C s cells as the probability that a random C s is a first class cell. Similarly the rate contribution 
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from the second class C s cells is 



I? 2 ' 



P(fe 1 ( i ))5A 6 log 2 (p(fe 1 (i))5A 6 ) 



C B (i)^C'J 



(4 + 3a 2 )log 2 (^A b )- Y, Pfe" 1 W)fA 6 log 2 (p( g2 - 1 (i))) 



(57) 



C B (i)€C'J 



Thus, the rate i? 2 can be approximated as 



R-2 



R' 2 + i?2 

-log 2 (A b ) + (3 + 3a 2 )log 2 (^)- K9 2 " 1 (0)^A 6 log 2 (p(g 2 - 1 (i))) 
- E p(^ 1 (*))^A b log 2 ( P ( 9 - 1 ( l ))). 



(58) 



When g a (") is high resolution, p(q 2 is approximately equal to p(x), for any x € C e (i), and thus equal to 



where h(p) = \ logger 2 ) for the Gaussian source. Thus, D 3 « ^(n-A,,) 2 w 0.8974- ^^2- 27?2 . 
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